It has been seven weeks since I started working at the UK Data Service/Jisc.
Although it seems like yesterday since I first entered the office, I have learned so many new things and skills over the past weeks, and I have finally applied what I have been learning at the University for two years.
My time here at Jisc and the UK Data Service has been one of the most valuable experiences in my life and has convinced me that work with data and data analysis is the right path for me once I graduate.
My fellow intern Rabia and I at the start of our internship
For the first few weeks, I was getting familiar with the Carstairs index (more about Carstairs can be read in my previous blog post), and was accessing all the data needed for the calculations. I encountered several problems during this process, which in turn helped me to gain even more experience and new skills.
One of the issues I had was that I needed data across five different Census years and for the whole of UK, which were not all available.
For example, one of the indicators for Carstairs score is ‘low social class of household reference person’. This information was not available for Scotland in the 2001 Census, and so the definition of the indicator had to be changed to ‘low social class of all persons’ in order to proceed with the research. This subsequently resulted in us needing to redownload all the data for the other years, and very “messy” folders with lots and lots of data we did not need anymore.
At that point I realised it was essential for me to organise all my folders and documents, and to name them with sensible names so I was sure I knew what each document contained and where to find it. Even though I have always considered myself as an organised person, working with such large amounts of data showed me I had to be even more organised to be able to progress to the analysis per se.
Another issue I faced was that different research papers used different Census variables to calculate the Carstairs score.
This was caused by the papers focusing either on one country rather than the whole of the UK or a specific Census year. Therefore, they all used the variables that were available for their particular projects. This, however, meant that Rabia (my fellow intern) and I had to adjust the definitions of the variables, so they fitted our own needs. We were lucky enough to find a few researchers who used indicators, which were available for all Census years we were analysing, and for all the countries in the UK, and so we were able to support our decisions on the definition changes.
I managed to solve all those problems and moved onto calculating the scores and analysing the data. This was done in R, which saved me a lot of time and enabled easy replicability. This was extremely useful as I had to recalculate my scores multiple times due to the problems outlined above. I have always been keen on working with R as I find it very intuitive and yet challenging, which I enjoy. I feel I have become proficient in R and I am very confident using it in a work environment.
I have always enjoyed working with data, but regardless, the analysis gets a bit more exciting when you can finally see a story the data tells.
To do this, I uploaded all my results into QGIS, an open source Geographic Information System, which enables creations of maps. All the data I have worked with concerns the geographical areas in the UK, hence mapping deprivation as well as providing tables with specific scores has felt to be the right (and very visually appealing) choice.
I have learned how to use a lot of functions QGIS offers, including creating 3D maps and analysing geospatial data.
Working with the software has woken up a very creative side of me I never knew I even had. I discovered that data analysis can be very artistic and original, especially when it comes to presenting the findings.
Deprivation in the UK from 2011 to 1991, by local authority
One of the things I have found out by just looking at my maps was that deprivation decreased massively between 1991 and 2011 according to the Carstairs index.
The darker the area, the more deprived the local authority is and vice versa.
When looking at the raw numbers, one would have no idea about this unless doing some further analysis, and so plotting numbers into a map is a great way of finding out whether there is something going on and subsequently go on about the specific analysis.
As I saw that the older maps are much darker than the ones from more recent years, I decided to explore whether this trend is significant.
I created confidence intervals and boxplots, presented below, to support my initial hypothesis about deprivation getting lower. The confidence intervals do not overlap and so I could be confident that deprivation decreased significantly between 1991 and 2011 in the UK.
Boxplots and 95% confidence intervals for deprivation levels in the UK from 2011 to 1991
Currently, I am working closely with Matt Ramirez, Futures senior innovation developer here at Jisc, who is turning my 3D maps into a virtual reality environment.
I have been given the opportunity to think about interesting and unique ways of using VR to display my results. I was particularly excited about of my ideas consisting of a lift, which would take the users up to different Census years. They could then get out of the lift and move around the map, which would result in a great interaction between the data and the users, and thus enhanced learning.
Unfortunately, given the short time frame this was not possible to complete, and so we had to simplify the visualisations and not use the lift. Regardless of that, the idea may be implemented in the future as it requires greater amount of time than we have at the moment but would be worth trying out.