Matching the CURATE(D) standard: data curation at the UK Data Service

Sharon Bolton, Data Publishing and Curation Manager at the UK Data Service, discusses the Service’s CURATE(D) approach to managing datasets. 

High quality data doesn’t just appear – it’s curated. As the principal repository for economic, population and social research data in the UK, the UK Data Service plays a critical role in the data curation landscape. But what exactly happens between a researcher uploading a dataset and another using it for groundbreaking secondary analysis? That’s where our curation magic happens.

To guide our work and ensure robust curation practices, we follow the CURATE(D) framework – an internationally recognised, structured workflow model developed by the Data Curation Network (DCN) to bring consistency, transparency, quality assurance and trust to data curation. Each step in CURATE(D) – Check, Understand, Request, Augment, Transform, Evaluate and Document – maps directly onto our in-house practices.

Let’s walk through it…

 

Check

We thoroughly review all incoming data, metadata, and documentation to assess readiness for secondary use. For each study (or data collection), this involves:

  • Running data integrity checks with tools such as QAMyData to identify errors, outliers and missing metadata and creating a plan to rectify them.
  • Conducting a disclosure review using tools such as R sdcMicro and resolving issues in collaboration with data depositors. This may involve modifying variables or applying restricted access conditions if necessary.
  • Reviewing documentation like questionnaires, technical reports and project methodologies to ensure adequate information is provided for data usability and analysis.
  • Assessing metadata provided by the data depositor to enable the creation of a comprehensive catalogue record.

 

Understand

Through careful assessment and comparison of data and documentation, UK Data Service curators gain a deep understanding of the contents of each dataset. This informs our strategy to enhance usability of the study and ensure its readiness for long-term preservation.

 

Request

When needed, we reach out to data owners to clarify or agree a solution to correct errors or clarify content. Changes are agreed upon collaboratively and made with traceable syntax/code, ensuring full transparency and provenance.

 

Augment

Curators create rich, standardised metadata for the UK Data Service Data Catalogue using the Data Documentation Initiative (DDI) standard.

A persistent Digital Object Identifier (DOI) is assigned to each study via DataCite. Adding a DOI allows accurate citation, enduring traceability and easier tracking of research impact. Even if the data are withdrawn, the DOI will always resolve to information about the resource.

Our catalogue records are interoperable and harvested by the CESSDA Data Catalogue (CDC), making them findable across Europe and beyond. The addition of keywords from the CESSDA European Language Social Science Thesaurus (ELSST) enhances discoverability and interoperability by ensuring that terminology is harmonised among other repositories using ELSST.

 

Transform

Once all edits are complete and data issues resolved or mitigated, we use automated scripts to complete quantitative data processing. These fulfil two functions:

  • They create the dissemination versions of data that users receive, along with a data dictionary for each file. To support wider accessibility and our designated user community, the UK Data Service generates multiple data dissemination formats. For quantitative datasets we currently provide SPSS, Stata, and .txt versions of each data file (tab-delimited or CSV). We also keep in touch with our user community to ensure we support new and emerging formats, and we can generate non-standard formats on request.
  • The scripts also generate a platform-independent version of each file in a format suitable for long-term preservation of data integrity, such as fixed-width ASCII, which can also be used to populate new formats as needed.

Qualitative data are generally prepared as Rich Text Format (RTF) or .txt, depending on the transferability of their original deposit formats: see our guidance on Recommended Formats.

 

Evaluate

Before release, we assess each study holistically to ensure it meets the FAIR principles. Some of the questions we consider are as follows:

  • Is the catalogue metadata detailed and searchable enough to make the data Findable?
  • Are the data formats Accessible and Interoperable?
  • Are the data well-labelled and comprehensively documented? Is processing complete or is there anything else we can do to enhance Reusability?

 

Document

Finally, we ensure lasting traceability and provenance by maintaining metadata records. Creating the UK Data Service catalogue record provides external metadata, but that’s not all we document.

Internally, we track the curation journey of each study by opening a record upon its arrival, including details about the original deposit, then documenting all steps undertaken during data and documentation processing from start to finish.

Alongside accompanying syntax used to make any data modifications, the internal record is archived for long-term preservation with the original data and documentation files and the dissemination versions we produce.

We also create a standard ‘readme’ file for data users, describing the steps undertaken to curate the data and highlighting any features users should be aware of that may not necessarily be covered by the documentation.

 

Conclusion: why it all matters

At the UK Data Service, curation isn’t just a checklist – it’s a commitment. By embedding the CURATE(D) framework in everything we do, we ensure that data shared with us is transformed into a resource that’s secure, accessible and ready for research now and into the future.

Our alignment with international best practices ensures that we provide high-quality, discoverable, and well-documented data resources that are preserved for the long term. For users, this means data that are trustworthy and supported. For funders, it signals quality, sustainability and impact.

As data sharing and reuse become increasingly central to research, we continue to evolve our curation services to meet emerging needs, championing both innovation and integrity in the data ecosystem.

We promise to protect and elevate the value of research data – curated with care.

 


About the author

Sharon Bolton is the Data Curation Manager at the UK Data Service.

She also works with CESSDA on the EOSC-ENTRUST project and with ELSST management. Sharon’s academic background is in criminology and crime data and she has over 20 years’ experience in data and metadata curation.

 


Comment or question about this blog post?

Please email us!