Esmeralda Bon, one of our #DataImpactFellows, reflects on how challenging – and essential – good data management is.
Many of us work with heaps of data, belonging to different projects. Managing and analysing all this data requires a system. Ideally, in the world of good data management, our data files are fully annotated, complete and easy to be found. They carry labels that are both intuitive and appropriately specific, and all files and folders are properly backed up. They are stored alphabetically or according to themes or dates, in multiple places, potentially including a cloud service, an external hard drive and even a password-protected USB. They also come with a range of appendices with instructions and explanations for the scrutinising reviewer and the invested reader.
However, creating a data management system that works takes willpower, practice and time. When I started my undergrad a decade ago, learning (or even reading!) about data management didn’t seem particularly interesting. However, it’s only in recent years where I have started teaching research design and managing large data sets that I have come to realise that yes, good data management matters. A lot.
Truth be told, I’m not a star at managing data. Often enough I still find myself virtually lost in folders, especially at the end of a rushed day of collecting data, converting files and running analyses. A maze of folders with the most obscure of names, unordered and stored in different directories. What is the difference between ‘results re-analysis 4’ and ‘results analysis version 2’? I wouldn’t be able to say.
This represents bad data management: a system that does not work. An alternative to good and bad data management is ugly data management. This is a system that does work, but which nobody would be proud of. A system of quick fixes and shortcuts, of lengthy labels and unusually ordered data files. It’s on the road to good data management and where I occasionally still find myself, at this stage of my research career.
Image: “Bad data management”, personal image / CC BY
This is why I admire the concerted effort of faculty and networks of researchers (such as Project TIER) to teach students good data management: to fix both the ugly and the bad. With the wisdom of hindsight, I now know that I have only recently grown to fully appreciate how good data management provides transparency. How it allows us to understand the systems of others, to reproduce the analyses presented in published work and to potentially solve contradicting findings. At the same time, good data management helps us understand and navigate our own systems. It saves time and (potentially) a lot of headache.
Therefore, if there is anything I have learned and which I would like to share with my peers, it is this: if you haven’t recently, then it is worth going back to organise your folders, to update those file names (for instance by adding a date or version number to the file), and to maintain a logbook. It might even be worth creating a spreadsheet to plan the re-organisation of your folders, sub-folders, and files.
Make sure that your folder and data environment is easy to understand and navigate. After all, in the end, good data management is both for the sake of replicability and your sanity.
Esmeralda Bon is a Research Associate at the University of Manchester, based in the Cathie Marsh Institute.
She works for the project ‘Digital Campaigning and Electoral Democracy’ (DiCED), a comparative project for the study of the drivers and effects of digital campaigning in 5 countries and 7 national elections during the period 2020-2023. She has recently obtained her PhD at the University of Nottingham, School of Politics and International Relations. Her PhD thesis focused on UK MP communication during the EU referendum, addressing the relationship between representation and the dynamics, frequency, and content of their political communication.