In this post Cristina Magder, the Data Collections Development Manager at the UK Data Service, discusses data licenses and access frameworks and how they are an integral part of what we do at the Service.
Ensuring appropriate data handling and access safeguards sits at the core of all data archives. Considering data protection legislation including UK GDPR and domestic laws, data archives play a critical role in advising data producers regarding best practices for data sharing while maintaining ethical and legal requirements. To ensure compliancy and to widen data access responsible repositories employ licencing and access frameworks.
A recent CESSDA train-the-trainer workshop organised by the Austrian Social Science Data Archive, the Czech Social Science Data Archive, and the UK Data Service included a key focus on the importance of licencing data and showcased licence frameworks at four European Archives. While the frameworks might use different terminology the event highlighted the similarities of the approaches in these Archives and allowed participants to further consider the importance of licencing.
Importance of data licencing
Licencing data is beneficial both for data producers and data users. Having data published under a licence establishes how the following three critical elements will be addressed:
- Accessing data
- Using data
- Sharing data
A licence framework ensures questions such as “Who can access the data?”, “How can secondary researchers use the data?” and finally “Can secondary researchers share the data? And if so how?” are properly thought of and an informed decision can be taken about data availability and use. Additionally, a data licence establishes clear rights including Intellectual Property Rights and data ownership, data processing and data use.
Data licencing framework at UK Data Service
At the UK Data Service, data are classified according to their level of detail, sensitivity and confidentiality. Appropriate data handling and access safeguards are negotiated with data owners to ensure compliance. For our curated collection we use a standard Deposit Licence Agreement.
The Deposit Licence Agreement specifies responsibilities for both the data depositor and the UK Data Service as the data service provider with a focus on ensuring enhanced curation, long-term preservation and compliant dissemination. The licence ensures FAIR (Findable, Accessible, Interoperable and Reusable) data is made available to the research community.
Furthermore, the Deposit Licence Agreement determines the access level under which data is made available for secondary research. A key regard is given to definitions of data the data being made available such as Personal Data and Personal Information as defined by data protection legislation.
In order to support open research, transparency and reproducibility and depending on the granularity, sensitivity and the level of anonymisation we negotiate with data producers a suitable access level:
- Open data: licence suitable for data that are neither classified as Personal Data nor Personal Information and with no residual risk of disclosure or where consent to share personal data as collected is in place. This is usually macrodata such as socio-economic time series data aggregated to a country over a significant period of time or teaching datasets where minimal socio-demographics have been retained or where perturbative anonymisation techniques have been applied.
- Safeguarded data: licence suitable for data that are neither classified as Personal Data nor Personal Information but where there is a potential residual risk of disclosure. Data otherwise referred to as effectively anonymised data as per ICO guidance. These microdata have been treated via recoding, top and/or bottom coding, suppression or perturbative anonymisation methods to ensure the individuals are unlikely to be identifiable. The use of these data is subject to the UK Data Service End User Licence Agreement and additional conditions or agreements might apply.
- Controlled data: licence suitable for data classified as Personal Information or Personal Data and data that are particularly sensitive, commercially or otherwise. This is de-identified data i.e. indirectly identifiable, for which data protection legislation applies. For these type of data a legal gateway for sharing must be identified and besides being subject to the UK Data Service End User Licence Agreement access is facilitated via the Five Safes Framework.
We make available information about anonymisation on our webpages for both quantitative and qualitative data, including key anonymisation techniques to help data owners ensure the privacy of their research participants. As a renowned centre of expertise in curating data we follow current best practice in preparing, managing and documenting data. Further information about our curation process is available here.
Access framework at UK Data Service
Funded by the Economic and Social Research Council, and with over 45,000 registered users, one of our main aims is to ensure the social science research community has access to relevant and up to date ready to use research data. Our access framework is defined by the licencing we have in place with the data owners.
We host over 9,000 data collections in our catalogue ranging from UK survey data to cross-national survey data, longitudinal and cohort studies, Census data and qualitative and mixed methods collections. Open Collections are made available either under the Open Government Licence or Creative Commons variations. Open Licences allow secondary researchers to use, adapt and share the information with very few restrictions such as for example non-commercial use is the Creative Commons Non-Commercial Licence is used.
Safeguarded and Controlled data is subject to the UK Data Service End User Licence Agreement, and additional conditions or agreements might also apply. Researchers registering with the UK Data Service agree to the End User Licence Agreement upon registration. The main aim of the Agreement is to ensure the data is used ethically and establishes the responsibilities researchers have when using these data. To further help researchers understand their responsibilities and to ensure data is used correctly we also make available the Research data handling and security guide. Most of the data we host is available under Standard Safeguarded access, with some collections made available under Safeguarded Additional Conditions or Special Licence access.
To protect research participants and for data to be shared under appropriate safeguards as per data protection legislation Controlled data (de-identified data) is made available via the Five Safes Framework which consists of the following:
- Safe Data (de-identified)
- Safe Projects (projects approved by the data owner or the Research Accreditation Panel)
- Safe People (accredited, trained researchers)
- Safe Settings (SecureLab, SafePod or SafeRoom),
- Safe Outputs (screen and approved non-disclosive outputs)
The UK Data Service SecureLab was first established in 2011 to enabled secure access to the most sensitive and confidential collections. Since then SecureLab usage has grown considerably and a wide variety of data are made available including governmental data, cohort data and linked administrative data. While historically Controlled data has been restricted to researchers based in the UK, UK Data Service as a member of the International Data Access Network, is committed in ensuring wider access to data. In February 2022 selected datasets from the Institute for Social and Economic Research and the Centre for Longitudinal Studies have been negotiated to be made available via a Safe Room at the Research Data Centre of the Federal Employment Agency at the Institute for Employment Research Germany.
Ensuring a robust licence and access framework
As a highly regulated organisation we review our contracts and documents at least yearly. Most recently, at the beginning of the 2022, our licence and access framework has been reviewed through an independent legal review to ensure not only compliance and consistency but also future proofing the framework.
The deposit process at UK Data Service is fully online. We have a deposit user guide available that describes the process in detail from i) Offering the collection for deposit to ii) Negotiating access and submitting the collection. Data depositors can sign the Deposit Licence Agreement via their account ensuring that access to the data is suitable.
To ensure accessibility and interoperability we use a standard data documentation initiative metadata schema for our collections and secondary researchers can easily identify the conditions of access of each study via their catalogue record.
When researchers want to make use of Safeguarded or Controlled data they have to register a project, briefly describing the aims of their research, and adding to their project the studies they would like to use. Depending on whether the study is available under Standard Safeguarding access, Additional Conditions or via the Five Safes Framework researchers have to complete the access workflow accordingly via their account.
Conclusion
In a continuously changing data landscape, where research data policies, legislation and the need to ensure transparency and reproducibility shape data sharing, established licence and access frameworks ensure data are made available and used not only legally but also ethically. The same scrutiny given to data must be given to ethical and legal considerations when sharing data. Making data collections available via a trusted digital repository, following relevant guidance on data classification and anonymisation guarantees wide and correct access to data and supports policy changing research.
About the Author
As the Data Collections Development Manager at UK Data Service Cristina ensures that key data are effectively identified, negotiated, and appraised for research and teaching. She leads the research data management portfolio of support and training for the UK Data Service, with a particular focus on facilitating the implementation of the UKRI Research Data Policy for the ESRC. Cristina is committed to further help building the culture of FAIR data sharing and to empowering researchers to make informed decisions about how best to manage and share their data.