As part of our celebrations for the tenth anniversary of the UK Data Service SecureLab, Richard Welpton (Head of Data Services Infrastructure at the ESRC) looks back at the birth of secure remote access in the UK and discusses the key challenges, successes, and benefits experienced along the way.
Train tickets? Check. Keys, phone, papers? Check. Alarm clock set for 5am? Check. The routine of getting ready to travel the night before may seem like a distant memory to us in recent times. Whilst data may seem ubiquitous to us now, it wasn’t that long ago that routines like this were commonplace for researchers who wanted to access data as part of their research programmes. Despite the availability of thousands of anonymised datasets available to download from services such as the UK Data Service, it used to be the case that more detailed and therefore more sensitive data sources could only be accessed and analysed by visiting an onsite facility (i.e. in safe rooms).
Anonymisation is a key tool for enabling data about individuals to be analysed by researchers while preserving confidentiality. While this blog post is not about anonymisation per se, it is useful to note that there are times when data about individuals and business organisations cannot be usefully anonymised. Some research projects simply demand the level of detail in the data that anonymisation would destroy (even after direct identifiers such as names and addresses are stripped away).
And so because the detail exists in the data, there remains what is known as a ‘residual risk’ that if the data fell into the wrong hands, a data subject (individual or organisation) could be re-identified, and data gathered in confidence associated with them. If this were to happen, there would be a breach of the confidentiality contract with the individual or organisation supplying data in the first place (and possibly a breach of data protection laws). Strict controls around how the data are accessed are therefore put in place around the data (frameworks such as the Five Safes are typically adopted). These controls form part of a data access environment (these have had various names, including Research Data Centres, Secure Enclaves, Secure Data Environments, and most recently, Trusted Research Environments (TRE)).
The early days of accessing secure data
In the late 2000s, the Office for National Statistics (ONS) established its TRE, the Virtual Microdata Laboratory (VML, now known as the Secure Research Service (SRS)) to provide access to these sensitive data. This improved access to data was of course very welcome, and, at a time when organisations were a bit more wary about why researchers needed access to detailed data, the VML was a huge step forward.
Yet for many researchers, the VML meant setting the alarm clock early. While it facilitated access to data that would not at the time have been possible to access otherwise, researchers could only access the data by physically visiting an ONS office: Newport (Wales), London, Titchfield (Hampshire), and for a while, Southport. Access points were later set up in Glasgow, courtesy of the Scottish Government, and Northern Ireland. Visiting these onsite facilities was time consuming and costly. Half or entire days had to be scheduled in advance.
We know that research is an iterative process: understanding the data, developing and testing code, debugging, generating results, thinking about the results, refining and debugging code again, etc., it’s all part of the job. But researchers also teach, have other departmental responsibilities, and are usually engaged in a wide range of research. They also have family commitments. A day trip to the VML therefore had to be fitted in around this complex research lifecycle and all these additional responsibilities.
A new challenge: secure remote access
That security of data, and protecting confidentiality, is paramount goes without question. So what was needed to help reduce the burden on researchers was a secure solution that would enable researchers to access data without making an arduous trip. The solution came via funding from the Economic and Social Research Council (ESRC) to the UK Data Archive in 2009 to set up the Secure Data Service (SDS), now known as SecureLab after its integration into the UK Data Service in 2012.
UKDS SecureLab, or SDS as it was then, was designed to provide secure remote access to these data for qualified researchers, which they could access from their research institution (usually, but not limited to, a university). A similar way of accessing these types of data had already been established by the NORC data facility at University of Chicago. But up until this point nothing similar existed back this side of the pond.
With the launch of UKDS SecureLab (né SDS), then, secure remote access to data in the UK was born.
During August 2011, I made a phone call to Dr Catherine Robinson, then based at Swansea University. I talked Catherine through the process of connecting to SecureLab. It took a couple of attempts, but thankfully the connection worked and Catherine could access all her previous VML data and work, which had been loaded into her new SecureLab account in advance. Previously, she would have had to drive from Swansea to Newport to access these data and undertake her research, but now she could access everything within a few clicks online. I recall this was the first secure remote UKDS SecureLab connection we made live.
The benefits of secure remote access
For the research community, UKDS SecureLab made a huge difference:
- Increased research productivity because of easier access;
- Research results generated more quickly, and statistical outputs published faster;
- Researchers no longer using research grants to pay for travel to access data.
These are but a few of the benefits realised. ESRC’s investment paid off.
Some 1500 researchers have been supported since the launch of the service in 2011 (there are currently about 950 researchers actively using the facility).
UKDS SecureLab provided the blueprint for a number of similar TREs to be established in academia, the third sector, and government. Other TREs have been established by HMRC (the HMRC Datalab), Swansea University (SAIL Databank), Cancer Research UK and The Health Foundation. The ONS Secure Research Service now also provides remote access, and supports the Administrative Data Research UK network. A list of institutions managing these facilities can be found here.
Navigating the pandemic and looking to the future
During the pandemic, service providers worked hard to meet the challenges of enabling continued access to secure data facilities, negotiating new agreements with data providers so that researchers could access the data from home (e.g. temporary relaxation of requirements for secure data access. You can read more about the running of UKDS SecureLab during the pandemic here). This was no easy task and it has been brilliant to see the data community working together to find ways to enable continued access in such turbulent times.
Looking to the future, a new generation of TREs will continue to deliver increased capability and improve research productivity. The UKRI Data Access Research Environment (DARE UK) initiative will consider how these facilities could work. Further ways of providing secure access and extending reach and accessibility of these kinds of data are being realised, such as the new SafePod network.
Since UKDS SecureLab (né SDS) became operational in 2011, we have come a long way with providing secure remote access to data in the UK. Happy 10-year anniversary UKDS SecureLab!
Keep an eye out for more SecureLab at 10 information and stories in the new year – the UK Data Service will be marking the anniversary with the launch of a January newsletter to provide more guidance and training specifically for SecureLab researchers, and further Data Impact blog posts exploring developments in SecureLab and the wider world of data access!
I began my career working in the Virtual Data Microdata Laboratory at the ONS as a Support Officer. In 2010, I moved to the UK Data Archive to help set up the Secure Data Service (now UKDS SecureLab), so I’ve been involved in the development of secure remote data access in the UK from the very start.
In this blog, I’ve therefore tried to illustrate how data access evolves. It’s not always easy for data owners: they often receive too little credit for enabling access but often take most of the risk should anything go wrong. And generally, as a rule of thumb, some access is better than no access (although for some researchers, some access will continue to mean no access). As technology, legal frameworks and trust has developed, so more detailed data have become available for researchers to use, and more easily. The journey will continue.
He has held previous roles supporting researchers to safely access confidential data at The Health Foundation, Cancer Research UK, the Valuation Office Agency, UK Data Archive (University of Essex) and the Office for National Statistics. He has also undertaken freelance work, including supporting television documentaries using data.