Cameron Brick on using the diverse range of openly available psychological data, including from the UK Data Service.
I have been frustrated for years that psychologists are not using large, existing, free data to its potential. The problem is we don’t know what exists, what it contains, or how to use it without a huge investment. In both teaching and research, social scientists mostly collect new datasets that are often unsatisfactory in measure quality, sample size or population.
Most researchers are dimly aware that there are larger-scale, high-quality open datasets available for secondary analysis. I firmly believe that these datasets can be useful both in research and in teaching undergraduates and postgraduates key analytical skills on high-quality data relevant to their interests and courses.
A key problem is that the data are hosted on a range of complex platforms and a lot of time is needed just to find out which datasets exist and what themes they cover. As this was a barrier to me after years of being a professor, it is likely also a barrier to other social scientists.
My idea was to collate a list of openly available psychological datasets. I did this initially for my own benefit and then realised the benefit of sharing it publicly. As I had no funding to build a platform, I decided to adopt a service that would be trivial for others to access and edit—a Google Docs spreadsheet.
I simply listed datasets and what they contained rather than provide the data itself. I then organised a hackathon session at the Society for the Improvement of Psychological Science (SIPS) 2019 conference in Rotterdam to give others a chance to search for useable datasets and add them to the list. Laura Botzet, Cory Costello, Anatolia Batruch, Ruben Arslan, Melissa Kline, Nicolas Sommet, Tobias Dienlin and Hannah Metzler joined me in a successful effort to expand the list and add metadata on themes, keywords, study populations, etc. The hackathon group and other contributors have already identified a range of useful datasets that I had never heard of.
In this spirit of transparency, the list is openly editable for others who discover additional datasets or who could help fill in the blank cells. In addition, this transparency has led to open discussion (largely on Twitter) about the validity of incorporating some datasets in the list, and in one case, debate leading to agreed removal of an OkCupid dataset because the participants had not provided informed consent.
As the list developed, we also incorporated a sheet with other lists of datasets. Imposing some structure was appropriate to avoid the problem of having thousands of small datasets in the main view, most of which would have no metadata and not be relevant to most users.
Advantages and disadvantages
Our ad-hoc list is a simple, easy-to-use platform with straightforward searching and matching.
On the downside, the list is not a comprehensive list of psychological datasets, and the metadata is incomplete. The flat structure of a spreadsheet also implies similar value between rows, but the entries vary hugely in quality and sample size. A big disadvantage of our list (and most datasets) is not being machine-readable. See the exciting Psych-DS project on that topic.
Recognition and impact
The list has its own DOI and in personal communication researchers have expressed their positive feedback and intention to cite the resource. I don’t have a record of page views or usage, but anytime I open the document there are multiple other users present, suggesting wide use across this past year.
Recently, this project was awarded a 2020 Commendation by the Society for the Improvement of Psychological Science for contributions to both teaching and research.
In an ideal world, a research assistant or student intern could devote project time to improving the meta-data, and even produce evidence gap maps and guides for specific classroom assignments. Can you help, or do you have students that could donate a few hours?
Anyone is welcome to contribute, and they do not need permission nor a log-in to begin. Our list ‘Making free, open psychological datasets more accessible for research and teaching’ can be edited directly on Google Docs. Happy data hunting.
His core interest is how individuals react to collective problems such as climate change. He builds models predicting social and political behavior from cognitions, individual differences, and social context. He is also interested in communication effectiveness, both for supporting informed decisions (i.e., communicating harms and benefits) and for behavior change (persuasion).