Future Data Services: Fixing the data pipeline

Richard Welpton, from the Economic and Social Research Council (ESRC), outlines how data services must evolve to meet researchers’ growing needs, tackle technological and staffing challenges, and streamline access to sensitive data as part of the ESRC’s Future Data Services initiative.

Data Services play a vital role in the research data ecosystem. As ‘data intermediaries’ they enable researchers to easily discover, access and use data, maximising the use of data.

The value of data services

Without data services, researchers would either rely on collecting their own data for their research or apply to individual data owners one by one to access the data they need. Just as importantly, they wouldn’t be able to find out what data already exists and may unnecessarily duplicate the work of others before them.

In an ideal world, data services offer consistent ways of accessing data. Because they produce metadata consistently, data services allow researchers to compare datasets easily. For researchers, they reduce the search cost of finding and using data; for data owners, they reduce the cost of providing access to their datasets.

Large statistical data producers (such as the Office for National Statistics), national studies (such as Understanding Society and the collections maintained by the Centre for Longitudinal Studies), smaller studies and individual researchers who collect their own data for research can deposit their data at a data service, leaving it to the data service to manage data access.

And so much more…

But data services are far more complex than merely repositories for storing and accessing research data.

Supported by complex technical infrastructures, data services are designed and operated by skilled research technical professionals who curate and preserve data, create metadata and provide comprehensive searchable data catalogues.

Increasingly the task extends well beyond providing access: some data services get involved in the innovative and technically demanding work of converting raw data into structured, research-ready datasets that can be safely and meaningfully used for analysis.

Researchers who interact with data services will testify to the expert support and extensive training provided which help them make the most of the data they access.

Explaining the impact of research using data is critical for building trust in the use of data and this is another critical role undertaken by data services.

At the Economic Social Research Council (ESRC), we’ve been investing in data service infrastructures for many decades. We began funding the Social Science Research Databank in 1967 (now the UK Data Archive, the lead partner of the UK Data Service). Professor Allison Park has written about the importance of our infrastructure investments in this Data Impact blog.

We also fund bespoke data services including the CenLS trio (supporting access and use to the three census Longitudinal Studies), CLOSER (helping researchers use longitudinal population studies) and more recently, ADR UK (for administrative data) and SDR UK (for smart data).

And as science spans international boundaries, we’ve supported international initiatives like CESSDA too.

Understanding the challenges

Times change

The data landscape evolves fast and it’s unrecognisable compared to 2007 when I began my role as a User Support Officer in the ONS Virtual Microdata Laboratory (now Secure Research Service).

Different legal frameworks are in place; technology and research methods have advanced; and more types of data (including administrative and smart data) are available to complement traditional surveys.

Research and researchers have changed too

There is much more collaboration between researchers in different institutions and between researchers, policymakers, third sector organisations and communities.

What hasn’t changed is researchers’ ever-increasing appetite to expand the frontiers of knowledge by developing new research areas and methods.

Future Data Services

It’s in this context that we began our Future Data Services journey.

We set out to understand the challenges that researchers and data services face when discovering and accessing data. Here’s what we did:

Undertook a series of workshops engaging with researchers and data service staff.

Interviewed over 100 people, ranging from researchers, data service staff and researchers, and data owners (often in government agencies).

Visited data services that ESRC fund, other infrastructures in the UK, and similar services in other countries.

What did we find?

Searching and understanding data

It became evident that many data catalogues have recently been created, and from talking to researchers, it was clear that they can struggle to know which catalogue they should use, or even where to find the catalogue they need.

Metadata

Researchers also told us about inconsistently produced metadata, making it hard to understand and compare datasets. Researchers also pointed to examples of good practice.

Data access

We focused on access to sensitive data in ‘Trusted Research Environments’ (TREs – secure computing environments where researchers have to log in to access and use data).

Although we noted the efforts by data services to improve processes, researchers told us they encountered unclear procedures and data access application forms. When combined with sometimes lengthy waits for data, these challenges got in the way of timely research. In some cases, research was either significantly delayed, or worse, abandoned altogether.

The people

People are the lifeblood of data services and we rightly wanted to focus on them.

Our review gathered evidence about the challenges of recruiting and keeping staff in a competitive ‘data science’ landscape and we also heard about ideas for helping staff develop their careers.

There is a clear need to support staff who want to learn about implementing new technologies, such as AI, to improve the data pipeline process.

Cross-cutting these themes is technology

We talked to experts about how the role new technologies could play in streamlining and automating data discovery, access and use, helping to address the challenges we’ve highlighted above.

Pilot projects

To support our review, we funded 9 pilot projects (with support from the UKRI Digital Research Infrastructure fund) to develop new tools to improve data discovery, federated data access and support staff with their development needs.

These pilots helped us to understand where ESRC could make further investments in technologies and staff to provide improved data services for researchers.

Our recommendations

Our report is available here. We make 19 broad recommendations in the following areas:

Implementing new tools for making data discovery easier and intuitive

Working with researchers to design services and giving more opportunities for researchers to feed back

Smoothing the research data access journey, with the aim of minimising data access waiting times and improving transparency

Ensuring Trusted Research Environments work for researchers, providing home-from-home computing environments

Investing in staff and career development

Emphasising the vital role of Scientific Use Files, accessible outside of Trusted Research Environments, for undertaking research and skilling up our research workforce

What will we do next?

We’ve already been working with data service infrastructures to implement our recommendations.

Our work has already helped inform how ADR UK and SDR UK commission their data services. The evidence we’ve collected has informed the National Data Library Programme and the forthcoming UK Statistics authority review of research accreditation. Much of our work has complemented the UKRI Digital Research Infrastructure programme.

In the coming months, we’ll be working with our data service infrastructure investments to support improvements based on our review’s evidence and our recommendations.

Like our data service infrastructures, our aim is to support researchers make the most of our rich data assets for the public good and improving lives.

About the author

Richard Welpton is the Head of Data Services Infrastructure at the Economic and Social Research Council (ESRC), part of UK Research and Innovation.

He oversees ESRCs investments in data service infrastructure, which includes UK Data Service, CLOSER, the Census Longitudinal Study Research Support Units, and is also ESRC’s representative for the UKRI Digital Research Infrastructure programme.

Prior to joining UKRI, Richard worked in roles supporting researchers with access and use of secure data, including at Office for National Statistics, UK Data Archive (where he helped to set up and manage the SecureLab), Valuation Office Agency, Cancer Research UK and The Health Foundation.

Comment or question about this blog post?

Please email us!

Data Impact blog