World Digital Preservation Day

Hervé L'HoursToday is World Digital Preservation Day. To mark this occasion Ollie Parkes and Hervé L’Hours discuss the relationship between the UK Data Service and CoreTrustSeal. They describe work on a recent working paper on preservation levels and what benefits this framework will bring to repositories like the Service and its users.

Who are CoreTrustSeal and why has the UK Data Service cooperated and aligned with it?

CoreTrustSeal are an international, community-based, non-governmental, and non-profit organisation promoting sustainable and trustworthy data infrastructures. They certify data repositories based on a community-agreed catalogue of requirements which reflect the core characteristics of trustworthy digital repositories. With over 100 repositories certified worldwide, they are a recognised indicator of trust across data infrastructures.

The UK Data Service has engaged with CoreTrustSeal since its origins as the Data Seal of Approval where Professor Matthew Woollard was part of the original international board. Repository Preservation Manager, Hervé L’Hours helped to develop the CoreTrustSeal through a Research Data Alliance Working Group and acted as Vice-chair of the CoreTrustSeal during the 2021 – 2024 board term. The UK Data Service was involved in the development of CoreTrustSeal’s Curation and Preservation levels, as well as further developing the CoreTrustSeal requirements and associated guidance.

Recently, the GESIS – Leibniz Institute for the Social Sciences organised a visiting researcher session to consider the metadata implications of CoreTrustSeal’s latest version of their Curation and Preservation Levels. This was attended by Hervé L’Hours, Mari Kleemola (Finnish Social Science Data Archive) and Jonas Recker (GESIS) and, as a response to CoreTrustSeal’s Curation and Preservation levels, resulted in the working paper ‘CoreTrustSeal Levels of Curation and Preservation: Implied Repository and Object Metadata Characteristics’.

CoreTrustSeal’s Curation and Preservation levels – what they are and why they matter

Version 3 of the levels sets them as:

  1. Level Zero. Where data and metadata are distributed as deposited with no checks of deposit compliance, initial curation or active long-term presentation.
  2. Deposit Compliance. Where deposited data and metadata are checked for compliance against a defined criteria i.e. correct data formats, sufficient metadata elements, etc. If the criteria is not met, the object(s) may be rejected.
  3. Initial Curation. Where the repository curates the objects to meet a defined criteria. This may exceed the criteria set out in Deposit Compliance and may include the enhancement of metadata or the creation of dissemination formats.
  4. Active Preservation. Where the repository takes long-term responsibility for ensuring the data and metadata are fit for the designated community’s reuse. This could include guarding against the threat of technical obsolescence, updating hardware and software environments, or migrating the archival and dissemination formats of the data and metadata.

CoreTrustSeal’s levels of curation and preservation are important because they allow an organisation to describe and document how they care for a digital object. They range from simple storage of an object which is distributed as it was deposited, to the repository taking long-term responsibility for reuse of the data and metadata, including making changes based on the needs of the repository’s designated community.

The levels themselves, however, do not specify how a specific object is being cared for, or what information and artefacts should be shared to clarify the approach to deposit compliance, initial curation and preservation. In addition to specifying any guaranteed retention periods (independent of the level of care), additional metadata could include:

  • criteria set as part of deposit compliance
  • criteria set for initial curation
  • technical factors e.g. links to technical monitoring, format criteria, emulation approaches
  • semantic factors e.g. links to community monitoring, semantic artefacts, ontologies, controlled vocabularies etc.
  • preservation and reappraisal times (periods of time with start dates) and triggers (e.g. risk)

How will they benefit repositories

Archives and repositories stand to benefit from agreeing upon and implementing additional structured metadata around digital object management, as it would strengthen the links between repository and object information. The additional metadata would allow for the easier management of digital objects, allowing for greater automation, including alerts and triggers for reappraisal and preservation actions.

How will they benefit users

Users of archives and repositories at all levels stand to benefit from the development of additional metadata. By demonstrating greater levels of provenance and transparency both data owners and data users would be able to see the levels of care a repository provides at both the object and the repository level, which could lead to greater levels of trust.

Third parties, including registries, would be able to harvest richer information about repositories and objects and third-party certifications, like CoreTrustSeal, would be able to automate the assessment of repositories.

The working paper is available via zenodo and is open for public comment via a Google doc that is linked on the zenodo page and can also be accessed here.

Please share the paper more widely. We think it has real potential for the wider community, as well as being a step towards more streamlined and automated assessments.


About the authors

Hervé L’Hours looks after repository and preservation issues at the UK Data Service. He has been actively involved in FAIR and Trustworthy Digital Repository (TDR) standards, including CoreTrustSeal, including through CESSDA, and has done related work on the SSHOC, FAIRsFAIR and FAIR IMPACT projects.

Oliver Parkes works in the Technical Service directorate at the UK Data Service as the Repository Project and Standards Coordinator. He represents the Service across several Horizon Europe projects with a focus on FAIR data and Trustworthy Digital Repositories.

Leave a Reply

Your email address will not be published. Required fields are marked *