Dr Paul Williamson of the School of Environmental Sciences, University of Liverpool discusses the InfuseR Innovation Fund project, which will provide a script-driven interface to all UK Data Service aggregate census data holdings.
The UK Data Service hosts electronic versions of all of the tables published from the 1971 to 2011 UK Censuses. These aggregate (tabulated) census data comprise one of the Service’s most heavily utilised data resources. Currently access to these tables is provided via a web-based graphical user interface: InFuse for the 2001 and 2011 Censuses; Casweb for the older Censuses.
A web-based graphical user interface (GUI) provides an excellent data discovery and extraction tool for novice to intermediate users. However, it is less well suited to meeting the more demanding needs of advanced users.
For this reason, the Service is collaborating with staff from the Centre for Spatial Demographics Research, at the University of Liverpool, to create InFuseR. InFuseR will provide a script-driven interface to all UKDS aggregate census data holdings held in InFuse format via a public version of the InFuse API.
Scripted access permits:
- the creation of complex data queries spanning multiple tables, cells, geographies and censuses
- iterative refinement of data access queries as a research project develops
- enhanced sharing of research methods via sharing of data extraction code
- research reproducibility
- enhanced data discovery via more flexible filtering using the full metadata available in InFuse
Since October significant progress has been made on two fronts.
First, the UK Data Service Census Support team have been doing work to open the API up. The API was originally a private API designed to support the InFuse application, so the main body of work has been repurposing this API into something suitable for public consumption, which has meant adding new API calls as well as enhancing other calls to better service a broader audience. A significant piece of work has been enabling the API to serve up more than one data collection at a time; this means that the API now allows access to 2001 and 2011 Census data held in our repository.
Second, a beta version of InFuseR has been developed to exploit this public API. The beta version of InFuseR replicates in full the functionality of the InFuse GUI, allowing users to (i) identify and select a desired tabulation of variables; (ii) drill down to select the geographic areas for which they require census data; (iii) download 2011 census data for their selected combination of variable tabulations and census areas.
As well as allowing users to script their queires, InFuseR also offers the following additional functionality not available via the InFuse GUI:
- search for and select areas by ONS area code(s)
- search for and select areas by name (e.g. “Liverpool”) without the need to ‘drill-down’
- search for variables by description (e.g. list variables relating to ‘illness’ or ‘health’)
- sort the order in which variable categories are presented for easier sub-setting (e.g. all single years of age before all five-year age groups)
- obtain extracted data sorted by area name OR by ONS area code
- obtain extracted data and associated metadata in full flat-file format, for easy import to other software packages
- aggregate extracted data to user-defined custom geographies
The project team are currently recruiting beta testers. Anyone interested should contact the project lead, Dr Paul Williamson. Anyone interested in finding out more about the InFuse public API should contact Rob Dymond-Green, UK Data Service technical lead on this project.
Next steps include updating InFuseR and the InFuse API to provide scripted access to 2001 Census data, and to further upgrade the functionality of both to enhance the user experience in the light of user feedback. Please find out more about all our Economic and Social Research Council (ESRC) Innovation Fund projects here.