The Discoverability of Longitudinal Datasets: Insight from the Landscaping International Longitudinal Datasets Project

In this post Daniel Yu discusses the Landscaping International Longitudinal Datasets project which seeks to improve access to mental health research data.

The development of mental health disorders is a complex process. Research on the life course of people living with mental health conditions such as anxiety, depression and schizophrenia, can help improve our understanding of the multitude of factors involved in their development. Results from this research can better place us to prevent and treat such conditions, ideally as early as possible. Our ongoing project, Landscaping International Longitudinal Datasets, involves our team at King’s College London, in collaboration with MQ Mental Health Research and the Open Data Institute, searching the world for the longitudinal datasets with the greatest potential for research on early intervention in depression, anxiety and psychosis.

As an undergraduate psychology student at the University of Bath, I have been very interested in developing my knowledge of mental health conditions and how different methodologies can be utilised for mental health research, in particular, to aid our understanding of the causes of and how to address mental health conditions. Having joined King’s College London for my placement year in September 2022, I have been thrust into the world of mental health research and immersed in learning about the value of existing datasets across the world.

Longitudinal Data on mental health disorders

Researchers often rely on cross-sectional datasets, which involve the “collection of relevant information data at a given point in time”. This research design is advantageous due to being relatively inexpensive and fast, allowing many variables to be studied simultaneously. However, cross-sectional datasets have a significant drawback: the inability to examine how variables may change across multiple time points and test for associations as participants age. As a result, longitudinal designs are becoming increasingly indispensable for mental health research.

As defined by CLOSER, the home of longitudinal research in the UK, longitudinal datasets “track the same individuals and households over time”. Longitudinal datasets are valuable because they allow direct comparison of individuals over multiple time points across the lifespan, as well as before, during and after important personal or global events, such as the COVID-19 pandemic. This broad perspective enables researchers to examine how changes in multiple factors may contribute to changes in mental health over time. As a result, longitudinal datasets play a key role in developmental psychology, and research on the development of mental health disorders.

For longitudinal datasets to be utilised to their full potential, it is important to ensure they are discoverable and accessible. Currently, there are several platforms in the United Kingdom whose work significantly aids data discoverability, such as:

Datamind: the Health Data Research UK centralised hub for mental health data. The hub supports mental health research through a range of activities, including making datasets more findable, accessible, interoperable, and reusable.
The Catalogue of Mental Health Measures: an interactive online platform detailing the mental health and well-being measures included in 55 British cohort and longitudinal studies, making it easier for researchers to efficiently compare the measures used across studies. Our team at King’s College London has developed and maintained this platform.

Our primary aim for this project was to identify longitudinal datasets with the greatest potential for research on early intervention in depression, anxiety and psychosis. However, our mission to identify eligible datasets granted us insight into how to improve discoverability efforts moving forward.

Searching for datasets

Using two different search strategies, we identified 8,275 (longitudinal and other) datasets across the globe. We termed these strategies:

Active search, for which our team searched through repositories containing information about a wide range of datasets. We reviewed 198 repositories, which yielded 7,977 datasets.
Passive search, for which we received information about longitudinal datasets from individuals worldwide via emails and our project website, facilitated by the dissemination of the project on social media platforms (i.e., Twitter and LinkedIn). In total, we received 276 submissions.

With our active search proving much more fruitful, it was clear that a variety of online repositories housing information on longitudinal datasets are increasing their discoverability. In particular, we found the following types of repositories to be most useful:

Journal websites such as the International Journal of Epidemiology, which included a list of and access to over 400 cohort profile papers.
Catalogues, which contained information about datasets on a variety of themes, such as net and Maelstrom Research, which detailed a variety of birth cohorts and epidemiological datasets from around the world respectively.
Hospital and university websites, which were often used when searching in countries where we struggled to identify relevant datasets.

Although these repositories provided a strong foundation for the identification of eligible datasets, our team did encounter some key, recurring issues:

Datasets were often poorly documented, whether that be from an absence of cohort profile papers or a lack of updates to previous cohort proposals.
In order to obtain further information and gain access to study documentation, datasets often required lengthy applications to and approval from study teams.
We experienced difficulties translating information from country-specific sources, especially when working with Eastern Asian sources.

Developing a new repository

Out of all the datasets we identified, 3,138 had a longitudinal design and many of them were found in more than one repository. The full list of these longitudinal datasets is now on our project website, in our own repository of longitudinal datasets. The development of this repository was in part unintentional, as it was not part of our plans at the onset of the project. However, since reviewing other repositories offered us great insight into what makes a useful and user-friendly platform, we hope to use this insight to develop the platform we have established thus far. Various details could be added onto the platform to make it more useful for researchers seeking information about longitudinal datasets, such as:

The number of participants at the inception of each study, as well as the number of participants at the latest time point of each study
The year that each study commenced and concluded (if applicable)
The number of time points that data has been collected at in each study
A link to each study’s cohort profile paper or website

Ease of access is still a barrier

Whilst the discoverability of longitudinal datasets would be further supported through this new platform, data access remains an issue. Researchers’ ability to easily identify datasets relevant to their research would be majorly boosted by clarifying the guidelines for data access per study, as ultimately, these datasets are most beneficial when utilised for mental health research.

The UK Data Service (UKDS) is a good example of a platform at the forefront of data access and discoverability. The UKDS provides trusted access to use some of the UK’s largest economic, population and social research datasets, including the English Longitudinal Study of Ageing, Understanding Society and UK Census Data. When using this platform, the access conditions for each study are clearly outlined, facilitating an efficient access process and thus greater data usage. Unfortunately, this is not the case for a large number of longitudinal datasets across the world.

It is clear from our recent project that there is an increase, but still a need, to enhance discoverability and accessibility of longitudinal data for mental health researchers on a global scale. Our hope is that the development of a new platform with information about international longitudinal datasets will be a helpful tool for increasing discoverability, and that simultaneous developments in accessibility will result in a much greater uptake of longitudinal data worldwide.

About the Author

Daniel Yu is an Undergraduate BSc Psychology Student at the University of Bath, undertaking his placement year at King’s College London. He is currently working within two projects: Landscaping International Longitudinal Datasets, and the Catalogue of Mental Health Measures. He is interested in the aetiology and treatment of mental health conditions.

This blog post contributes to our ongoing work looking at Mental health in data.

Data Impact blog

The Discoverability of Longitudinal Datasets: Insight from the Landscaping International Longitudinal Datasets Project

Tags