International and census open data use at the UK Data Service and the fruits of data citation

susan-noble

Susan Noble, Service Manager for international data at the UK Data Service based at Jisc, talks about how we find out what people have actually done with the data we provide and why it’s important that we do what we do!

Over the period January to September in 2016 we recorded (via Google Analytics) 37,777 data access sessions and 45,642 downloads by users from 156 different countries using UKDS.Stat, the data dissemination platform for international data at the UK Data Service.  The sessions are unique pageviews – i.e. aggregated dataset pageviews that are generated by the same user during the same session and the downloads are export pageviews – i.e. the number of times a user clicks on export to download a dataset:

int data1

Given this amount of international data usage, we were really keen to find out what people have actually done with the data we provide and why it’s important that we do what we do!  In the past, finding out this impact information has been a very labour intensive challenge as there was no standard way of citing the data obtained via the UK Data Service. Nonetheless, since the emergence of a standard format for citing data from the UK Data Service things are definitely improving.

We began making the international data more ‘citable’ in 2010, when we added citation information (including DOIs) to each international dataset where it could be downloaded alongside the data. We’re still in the early stages of seeing the ‘fruits of data citation’ as there was an inevitable lag before academics began to use this new citation information in their research outputs. However, we decided now was the time to take a look at some of the research that has cited our socio-economic aggregate data and attempt to visualise that data. Here we have a word cloud generated using terms used in socio-economic aggregate data research publication titles (census and international data) – using http://www.wordclouds.com/:

int data2

Or (to keep with ‘tree/fruits’ theme)…. a word cloud generated using terms used in socio-economic aggregate data research publication titles (census and international data) – using http://www.wordclouds.com/

int data3

Harvesting the publications citing socio-economic aggregate data

In order to find as many of the publications as possible that have used socio-economic aggregate data from the UK Data Service, we scoured Google Scholar using the following search terms:

Ideally, we wanted to use a single DOI as the search term, but we needed to include the actual data platform urls to ensure we captured the majority of publications that had used data from the Service, because not everybody cites the data they’ve used.

A search using Google Scholar for all publications citing international data obtained via UK Data Service returns 109 results (as on 20/10/2016) shows a diverse range of research – but only of research correctly citing the data. The following list of research publications show the interesting range of subjects addressed using international data, and each of these publications has been cited more than ten times:

In order to visualise this information, the next step was to extract it from Google Scholar – which it turned out, is not very straightforward!  We used an open tool called “Publish or Perish” which uses Google Analytics and allows users to bulk download citations from Google Scholar into Excel fairly easily. We searched using the “General citations” feature in Publish or Perish. A screendump of a search for InFuse is displayed below:

int data4

Google Analytics does not always retrieve all relevant fields (for example, we wanted article URL and year), and so some editing was needed.  In addition we inserted a number of “dataset referenced” fields to capture which datasets were used in which publications. 

Once ready, this information was exported into an Excel workbook and then used to create the simple alluvial diagram data visualisation shown below using the software Raw (http://raw.densitydesign.org – an open web app that allows non-technical users to create custom vector-based visualizations on top of the D3.js library through a very simple interface). 

citations int1

Publications on Google Scholar citing World Bank World Development Indicators via UK Data Service (and the number of subsequent citations), generated using extracted UKDS.Stat citations and http://raw.densitydesign.org/.

By harvesting citations periodically in this way we will continue to find out more about what is being done with the data we provide, but we’re always keen to hear directly from our users – so if you’re using any socio-economic aggregate data (international or census) in your research or teaching and would like to submit a case study of your data use we’d love to hear from you – simply download and complete the form and we’ll take it from there!

Other resources:

We have compiled a range of case studies demonstrating data use and its impact, search https://impact.ukdataservice.ac.uk/case-studies:

Case studies

 

Leave a Reply

Your email address will not be published. Required fields are marked *