Darren Bell, Hervé L’Hours and Matthew Woollard introduce the CoreTrustSeal and the UK Data Service’s response to their consultation with specialists, generalists, and technical repository service providers.
Providing a trustworthy digital repository (TDR) for our depositors and users forms part of the UK Data Service mission.
We’ve been part of the efforts to develop TDR standards for many years and through our UK Data Archive partner have been accredited to both the CoreTrustSeal and its predecessor standards. We provide disciplinary expertise around data of interest to social scientists and others, and in our experience domain expertise provides the best route for:
- data curation
- preservation
- support
- impact
But a range of other actors help support the full research data lifecycle and the CoreTrustSeal has increasingly seen a wider interest in their standard.
As a community-driven organisation a consultation process began which may result in changes to the requirements or certification approach in future versions of the CoreTrustSeal. The consultation asks for feedback on how CoreTrustSeal can best support specialists (domain/disciplinary repositories) alongside more generalist repositories and those providing ‘technical repository services’.
As a trustworthy digital repository for the social sciences, a holder of a broad collection of Census, international and economic data, a self-deposit service provider and partner to other repositories and services, the UK Data Service has an interest in seeing all of these actors supported. But this support must also ensure that the levels of curation and preservation, and the resultant impact of the data services, remains transparent to data depositors, funders and users. The text below is adapted from our feedback to the CoreTrustSeal Board.
The UK Data Archive has been in existence since 1967 (under different names) and it is the lead partner in the ESRC-funded UK Data Service which offers a wide range of repository data services, including support and training, for our main collection: data of interest to social scientists. We hold a range of academic and government survey data and mediate data use through the provision of secure access for confidential or sensitive data.
Defining our ‘designated community’ as social scientists means we must ensure that the data is findable and understandable to them (through metadata like DDI) and usable by them (through appropriate file formats) now and for the long term. We are also a ‘generalist’ repository in the sense that we support a wider community of users (from journalists to citizen scientists) through more basic metadata (e.g. datacite). This wider community also benefits from the specialist activities we provide for social scientists (e.g. metadata).
We understand that the demand for CoreTrustSeal certification is expanding, but we strongly feel that every applicant must take direct responsibility for the curation and preservation of their data collection for a clear designated community. A more ‘generalist’ repository with a heterogeneous collection must be able to define different measures taken for different ‘sub-collections’ for different designated communities.
This might be more work for the repository, but if this is not done the CoreTrustSeal risks an overbroad certification for collections which receive different levels of care. The level of curation and preservation must be clear to the designated communities, depositors and funders.
This is particularly vital for academic data where good provenance, curation and preservation help ensure the quality and impact of the science that is based on them.
Certifying complex collections with multiple designated communities might be more complex for applicants and the CoreTrustSeal, but this is unavoidable if appropriate Trustworthiness is to be certified with an appropriate level of consistency across such collections.
For scientific data, it is equally important that those funding, selecting and using data can do some from appropriately expert domain/disciplinary repositories.
If CoreTrustSeal is to certify more generalist repositories they must clearly identify through the type of certification, badging and metadata that they offer a different level of curation and preservation assurance from a domain repository. Recommendations like Science Europe make it clear that disciplinary expertise and long term preservation are both required to support reliable, impactful science.
A simple outcome for CoreTrustSeal would be to provide information that would support a search in a repository registry like Re3data for a certified trustworthy repository which was also a domain repository—supported by a list of disciplines. It is vital that a repository ‘with’ domain data is not the same thing as a repository ‘for’ domain data. A social science data archive may contain health data of interest to social scientists, but this does not imply (without evidence) that it has become a cross-domain repository for both health and the social sciences.
We understand that it is the mission of CoreTrustSeal to remain ‘Core’ and that specific criteria for different domains are beyond the scope. But applicants should demonstrate appropriate knowledge and curation and preservation actions that align with their stated designated community. One area of possible CoreTrustSeal improvement which might support this differentiation would be to require a clear level of depositor/user support and to ask questions about the domain expertise provided.
The issue of technical repository service providers is an interesting one. The community benefits from an increasing range of tools and services which support repository data services, but these tools and services providers do not take responsibility for the data curation or long term preservation, so they should be partners in an application (through the current ‘insource/outsource’ questions) and not applicants themselves. Of course, there is great value to a tool or service in providing supporting evidence (service levels, technical details, change management etc) which could be used by multiple clients in their CoreTrustSeal applications, but the final responsibility (trustworthiness) must remain with the applicant.
Here at the UK Data Service, we would expect to meet all of the criteria for a generalist repository while offering domain repository expertise for the social sciences. We take responsibility for any technical repository service providers that we use, and could conceivably be a TRSP for others. To meet the certification and communication needs of clearly identifying trustworthy repositories which offer domain curation and preservation, it may be useful to consider these issues from a more functional and service-driven perspective.
Good storage which reduces the risk of unintended is at the heart of data management. This may be offered by a TRSP which may offer varying levels of Deposit and Access functionality. But the criteria for storage (number of copies, frequency of backup), deposit (appraisal and ongoing re-appraisal), curation (quality assurance improvement), access (identity, rights) and use (technically usable and understandable for a defined ‘knowledge base’) all remain repository responsibilities, even if outsourced.
None of these comments are to denigrate the important roles of TRSP or generalists such as institutional repositories. They both help reduce risk to the long tail of data not currently being cared for. But it is important that data creators, depositors, users, science funders and science policy makers can all clearly see what level of curation, preservation and service is being offered by each type of organisation for each object or sub-collection they hold.
As a community we would benefit from clearer shared definitions of the different levels of curation and/or preservation offered by repositories and service providers levels and supporting services, not only for comparison but also to support clear roles and responsibilities when selecting and working with data service partners. We congratulate the CoreTrustSeal on raising this important issue and hope that they can help define and communicate these key issues around certification.
About the authors
Darren Bell is Director of Technical Services at the UK Data Archive.
Since 2012, he has overseen the transformation of the repository from a traditional social sciences file and SQL-based infrastructure into a cross-disciplinary data platform, most recently as a Co-I on the Smart Energy Research Lab project, a £6 million investment by EPSRC to deliver a world-class smart meter data repository for academic researchers in partnership with UCL. His professional interests include leveraging graph and semantic web technologies to deliver cross-disciplinary analytics and machine-actionable rights models.
Hervé L’Hours is the Repository & Preservation Manager in the Digital Preservation Systems and Security team at the UK Data Archive.
He works on repository support and maturity models for Trust & FAIR within FAIRsFAIR and on related work within the SSHOC (Social Science & Humanities Open Cloud) project. He worked on the UK Data Archive’s test audit against the ISO16363 standard for Trustworthy Repositories, is current vice-Chair of the CoreTrustSeal and past Chair of the Data Seal of Approval.
Professor Matthew Woollard is Director of the UK Data Service.
Matthew provides strategic direction as Director of both the UK Data Service and the UK Data Archive, based at the University of Essex. He has practical and theoretical experience in all aspects of social science and humanities data service infrastructure, having previously headed the History Data Service and served as Head of Digital Preservation and Systems at the UK Data Archive.
As director he has overall responsibility for the service strategy and key stakeholder relations. He also provides leadership in data curation, archiving and preservation activities.