Sarah Knight, @sarah_gis, UK Data Service Data Impact Fellow and PhD student in the Environment Department of the University of York, shares her journey through data science to impact and on to the Understanding Society Young Researcher prize this year.
When Davenport and Patil hit the Harvard Business Review in 2012 with the headline “Data Scientist – the sexiest job of the 21st century” I thought to myself “wow – you never see the words data and sexy in the same sentence”. In fact, from my own experiences in education and beyond, if you were remotely data-y or science-y, this was the exact opposite of what you were!
But as the article continues it neatly describes the intoxicating skills required of the modern day Data Scientist:
- The ability to code.
- The skills to communicate to a wide audience.
- An intense curiosity that enables one to question and hypothesise.
And I thought – wow, I do those things!
So am I a Data Scientist???
I have a degree in Physical Geography, a masters in Ecology, years of work experience as a Geographical Information Scientist and am now a current PhD student in Environmental Economics and Health. And I can honestly say that I never ever considered myself to be a Data Scientist. I just really enjoyed maths (considered a maths degree) and became specialised in spatial analysis, visualisation and geographical information systems (GIS) as a means to work in environmental conservation organisations, something that I am passionate about contributing to. I became fascinated in the technology and the data that underpinned this work, and how their use enabled new questions to be asked and new research avenues to be explored, as well as improving and streamlining existing approaches.
Data in research
Now I find myself knee-deep in data as part of my PhD. Linking large datasets across time and space to answer new questions , and to contribute to a fascinating area of research which looks at how the natural environment impacts human well-being. This brings together my previous interests and skills in a totally new area of work for me and I find it fascinating, challenging, overwhelming, satisfying and completely rewarding!
Day to day life at work feels far from sexy. Here’s what a normal day might look like in my PhD:
Open some files and try to remember what I was trying to achieve yesterday
Computer decides to shutdown to install updates
Stare at code written yesterday
Curse my foolishness at terrible code written too late in the day
Write some new code
Moan about software/coding language to office colleagues
Contribute to discussion regarding ArcGIS vs QGIS vs R
Search the internet for online help from help forums (god bless the online community)
Feel like a hero and probably make a celebratory cup of tea
Realise at least one of the following: there are missing values in the data, I have coded “missing” the same as “no response”, I’ve put illegal characters in as variable headings (big GIS no no!), the loop overwrites existing files rather than creates new ones and have therefore lost an insurmountable volume of data.
I am a fool
Spend all the hours in the day correcting code
So as you can see, much work using data is not what would be described as exciting. Much of it is just obtaining, sorting, cleaning, storing, restructuring and saving data. Data can be obtained in all sorts of formats and levels of quality and completeness.
But then the fun begins…
In my own work I use the Understanding Society dataset, and link it with external datasets, such as Defra air quality records. Being able to match large datasets that have not previously been linked is exciting. Harnessing the geographical context of datasets like Understanding Society also enables us to understand the data in a totally different way and certainly opens up many avenues for human-environment interaction research. My recent working paper using these datasets received some media interest, in The Guardian and The Times, and I got the opportunity to discuss the work with Living On Earth in the US. It even won me the Understanding Society Young Researcher prize this year!
What has inspired me?
I’m a visual person, I like to see what things look like, which is probably why I’ve found myself working with maps! This is why I particularly enjoy the work of the late Hans Rosling or the work of Danny Dorling and colleagues at the Worldmapper project, who skew maps to view phenomena other than area.
I like to communicate my work, learn from others and teach skills in data analysis and visualisation. I run a GIS group in our department and also help teach GIS to undergraduates. I am fascinated by students who say they hate GIS and that they “just don’t get maths”, very often they are female students, and I wonder what kind of experiences they’ve had to give them these views. You don’t have to like it, but I believe everyone can ‘get’ it, with the right teaching, experience and mindset. That’s something I’m keen to achieve in any future teaching roles. Data and computer skills are attractive to employers if pursuing any career, and certainly from my own experience in environmental or research sectors.
So what has data allowed us to explore in the environment and human well-being research area? Here are a few examples:
- Can clean air make you happy? – using approx. 200,000 records to examine how nitrogen dioxide impacts well-being (working paper 17/08 – blog coming soon!)
- Do scenic spots benefit our health? – using over 1.5million ratings of over 212,000 photos in the online game Scenic-Or-Not to ascertain what people find “beautiful” and how this relates to well-being
- What accounts for England’s green and pleasant land? – combining data relating to over 10,000 individuals and land cover data derived from remote sensing to explore the impact of different habitat types on well-being
- What’s nature worth? Count the selfies – taking geotagged photos on Flickr to estimate the economic value of outdoor recreation on public lands
So. Data Scientist. Sexy? Of course. #DataImpactFellows.