Nadia Kennar, Research Associate from the UK Data Service, explains the benefits of creating rich spatial data, and how GIS analysis can improve understanding of location-based problems and feed into better policy making.
My role as a Research Associate at the UK Data Service is to deliver and develop training events in computational social science research methods. One method that’s of huge interest to me is the use of GIS and spatial data. During my postgraduate degree I was introduced to the basic concepts of GIS and spatial data, which I’ll cover a little later on, and realised the potential this type of analysis has for contributing to real world phenomena and acting as a decision-making tool for governance.
In my Masters dissertation, I undertook a spatio-temporal analysis of missing persons across the county of Cheshire through rigorous spatial models and thematic cartography. Using Calls-For-Service data from Cheshire police (accessed via the N8 Policing Research Partnership), I was able to examine whether there is a relationship between the rate of missing incidents (per 1000 residential population) and different environmental factors in crime, such as deprivation and mental health issues. With a large absence of quantitative analysis in the study of missing persons, my paper introduced a new perspective that aimed to highlight: 1) how the number of missing reports has changed over time, and 2) how missing person incidents rates differ among particular locations.
Although I can’t discuss my findings in much detail for confidentiality reasons (I’m still working on this project), my results have confirmed the importance of spatial models – such as clustering, spatial regression and neighbour analysis – in identifying geographic patterns of missing incidents involving vulnerable people. Ultimately, my paper argues that spatial analysis is key to improving understanding of the geographic patterns of disappearance, which in turn can feed into policies to improve the safety and welfare of vulnerable people.
Geospatial analysis can be a bit of a minefield for those starting out in the field, given the vast datasets you typically need to handle and the range of different software available. So, this blog post aims to introduce some of the main theoretical concepts of GIS, the types of data needed to produce spatial analysis, and the various programmes that can be used to complete it.
What exactly is GIS?
GIS, or geographical information systems, are a theoretical framework that allows for the creation and analysis of spatial and geographical data. It can be viewed as an abstract platform that integrates data onto a map through various methods.
GIS are known to produce two broad types of maps; reference and thematic maps. Reference maps highlight natural patterns or synthetic features including the positioning and heights of mountains or the layout of bus routes. Whereas thematic maps highlight spatial relationships.
Some examples include:
- Police hot spot analysis – analysing where crimes are most prevalent in order to reduce crime or introduce/amend policy (thematic).
- Accident analysis – visualising road networks to improve road safety measures (thematic).
- Navigation – web maps such as Google Maps or apps such as City-Mapper (reference).
- Farming – analysing the location of soil data in order to increase food production (thematic).
Creating rich spatial data for analysis
Spatial data, or geospatial data, is a data frame that contains information about a specific location, which can be analysed to better understand that location. GIS enables this spatial data to be processed and analysed.
There are typically two types of spatial data:
- Vector data is the most common form and consists of points, lines and polygons.
- Points are a pair of coordinates (i.e. a location of a missing person call).
- Lines extend the points and include at least 2 points (i.e. the street that missing person call was received).
- Polygons extend the lines and include 3 or more points (i.e. the area, city or ward that street belongs in).
- Raster data normally refers to imagery or satellite data that are formed from a grid of pixels.
Once you have established your type of data then attribute data can be added; this would depend on your area of interest, but it can consist of spatial data (which refers to the size, shape and location of the feature) and non-spatial data (which refers to other attributes linked to the feature).
Taking an example from my dissertation, I produced a map displaying different areas across the county of Cheshire (i.e. spatial data), but each of these areas had additional attributes, for instance the urban/rural makeup (i.e. non-spatial data). More specifically, my spatial data focused on the polygon level, as the original dataset I worked with contained information on the number of missing incidents across different LSOAs (Lower-Layer Super Outputs, which are essentially small-scale geographic areas). I then matched the different LSOAs to different environmental correlates, including the levels of deprivation and mental health issues in each area. By joining the dataset to these attribute files, I was able to create a new dataset specifically tailored to help answer my research questions and aims!
Now the spatial data has had attribute data added to it, we refer to it as a spatial object. Spatial objects are stored via data formats known as shapefiles, which can include points, lines and polygons. They represent digital vector storage formats that contain geometric information and attribute information. Shapefiles can represent different measures of the population from large-scale country boundaries to small-scale LSOA boundaries.
Enhancing spatial data with Census data
In order to narrow the dataset to just the Cheshire county, I used the Boundary Data Selector interface (BDS). The BDS, a fantastic resource provided by the UK Data Service, is a piece of open software providing digitised boundary datasets representing the underlying geography of the Census. Users have the choice to select which boundaries they want (e.g. census, electoral, postal), the areas they want (within the UK), and in which format (e.g. shapefiles, CSV). These datasets can then be combined with other datasets to create new and richer data.
In my case, I selected LSOAs in Cheshire and then joined these to the dataset described above, so now I had data on missing person incidents and levels of deprivation and mental health issues for LSOAs specifically within Cheshire. As a result, I had a useable spatial object, also known in the field as an ‘sf object’, which would run alongside all of R’s spatial packages (more on R and other GIS software later on!).
I was then able to enhance my dataset further by using data from the 2011 Census, which is available to researchers and students via the UK Data Service’s InFuse platform. In particular, I used the Census geographies data, which provide small-area statistics for population estimates. For the purposes of my study, I selected residential population statistics at the LSOA level, which allowed me to calculate the rate of missing incidents (n/population*1000). Rates are often used over counts as they allow for the comparisons of events over different geographical areas. I was then able to map and study the rate of missing incidents across each LSOA in Cheshire, providing a more accurate portrayal of distribution rates.
This is just a taster of the different types of spatial data that you can work with, and how they can be made richer through data linkage. GIS offers many more tools for developing your spatial analysis, such as coordinate reference systems (CRS), which are useful for pinpointing locations on a map. I won’t go into CRS here, but if you’re interested in getting started with GIS it’s well worth exploring further.
What were the main benefits of using GIS?
GIS is an invaluable tool for mapping and visualising data, which helps to improve understanding of location-based problems and encourage better decision making. In the case of my dissertation, I was able to use GIS to create hotspot maps to identify highly clustered areas of missing person incidents in Cheshire. Through combining multiple types of spatial data, I was able to show how deprivation, mental health, and the urban/rural makeup of different areas affects the prevalence of missing incidents. All of this, I argue, can feed into policy to improve certain areas and help safeguard vulnerable people.
This sort of use and benefit of GIS can be seen in action across society. For instance, GIS is commonly used in the policing sector to identify whether high or low values of different crimes cluster spatially in specific areas, which is helpful for understanding location-based crime rates and making decisions about where to target resources. In fact, GIS has multiple benefits for the police: it acts as a guide for field officers, assists in the creation of crime prevention tools, helps with policy implementation, enables the testing of crime theories and increases communication across local and global agencies. The theoretical backbone of criminology is focussed on reducing criminal activity, whilst also protecting vulnerable citizens, and GIS and spatial analysis are invaluable tools for helping do so.
What software is available?
Given the prevalence of spatial data in our day-to-day life there are multiple platforms that allow for detailed spatial analysis to be conducted, including QGIS, ArcGis, FME and R. Personally, I prefer R as there is a wide range of spatial packages such as spdep, ggplot, ggspatial, sf and ggmap, which are incredibly adaptable for working with spatial data.
Additionally, R is an open-source software with a vast array of free introductory courses and books that can guide anyone into creating visualising appealing maps. To find out more about these packages, please refer to the resource list below.
Conclusion
Knowing how to join spatial datasets is a key skill in the world of GIS, as there’s a wide range of data sources available – for example, shapefiles, boundary data and census statistics – and being able to join them means you can create richer datasets tailored to your research needs. As I’ve shown by walking you through some of the key elements of my dissertation, using GIS to analyse these sorts of rich spatial data means that you can create high quality maps and analyse trends and patterns to answer important research questions, which in turn can help to tackle real-world challenges.
Resources
- Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016.
- Bivand, Roger S. and Wong, David W. S. (2018). Comparing implementations of global and local indicators of spatial association. TEST, 27(3), 716-748.
- Dewey Dunnington (2021). ggspatial: Spatial Data Framework for ggplot2. R package version 1.1.5.
- Pebesma, E., 2018. Simple Features for R: Standardized Support for Spatial Vector Data. The R Journal, 10 (1), 439-446.
- Kahle and H. Wickham. ggmap: Spatial Visualization with ggplot2. The R Journal, 5(1), 144-161.
About the author
Nadia Kennar is a Research Associate in the Computational Social Science team at the UK Data Service, based at the Cathie Marsh Institute at the University of Manchester.