Research transparency: a welcome climb or a slippery slope?

Louise Corti, the UK Data Service’s Director of Collection Development and Producer Relations discusses incentives to extend data publication in journals to qualitative research.

Over the past couple of years I have been fortunate to have been riding the wave of the recent transparency agenda emerging in political science, know as the DA-RT (Data Access and Research Transparency). In 2012 the American Political Science Association (APSA) took a collective decision to integrate DA-RT principles into its Ethics Guide, adhere to the 2014 Transparency and Openness Promotion (TOP) Guidelines, and adopt a Journal Editors’ Transparency Statement (JETS). On January 16 2016 a diverse and key set of leading journals in the field will release the first set of new DA-RT policies.

While data replication procedures are already fairly well-embedded for quantitative research-oriented journals, such as the American Journal of Political Science (AJPS) or Research and Politics which have their own stringent replication requirements (see example of a replication policy), the positon for qualitative research data is in an emergent phase. In my involvement with this US-based initiative in advising on how to extend data publication in journals to qualitative research, I want to set out some observations about how I think this is likely to be taken by the research community in the US, and in the UK too. My views are based on a 25-year stint in the world of research data sharing in the UK, commonly adopted by many academics with public funding. I also sat on a panel on open data at the recent Political Science Association (PSA) annual heads conference where we discussed this issue.

The political science wave is fast moving, and in some senses may be better placed to kick start the sparse practice of sharing qualitative data in the US, and extend the Open agenda.

But first I want to start with looking at the ‘transparency’ dialogue space. I frequently note a confused and confusing use of terminology: Open Research; Open Access and Open Data. The notion of transparency is inherent in all three of these paradigms. ‘Open Access’ concerns egalitarian access to knowledge outputs such as journal papers, and ‘Open Data’ has a more specific definition; according to the Open Knowledge Foundation, whereby “Open means anyone can freely access, use, modify, and share for any purpose (subject, at most, to requirements that preserve provenance and openness)”. Some social science research data cannot be completely open due to ethical and legal restrictions. For me, research transparency is really about Open Research.

One does not have to look far to observe an increasing number of transparency organisations in operation: The Research Data Alliance (RDA); the US Center for Open Science (COS) and the Berkeley Institute for Transparency in the Social Sciences (BITSS); Experiment in governance and politics (egap); and more recently, APSA. In the UK we have the Open Data Institute (ODI) and the UK government has a number of transparency boards; Public Sector Transparency Board, Open Data User Group, Research Sector Transparency Board.

Interestingly, the current universal data sharing and data publishing agenda in the research space was not initiated out of any specific drivers for validation and replication. It was the 2007 OECD Principles and Guidelines for Access to Research Data from Public Funding which declared data as a public good that pushed action forward. The OECD example cited is the large-scale international Human Genome Project in which “openly accessible information is being used successfully by many different users, all over the world, for a great variety of purposes”. This ideology promotes value for money through more and subsequent analysis of data, opportunities for collaboration and comparison, richer exposure of methods and finally validation.

The shift in semantics from open access to transparency in this research space is important but, in some ways, also troublesome. It is important to counter mistrust in research findings, eroded by the few abuses of trust in peer-reviewed reputable psychology publications in the Netherlands from unverifiable results and faking of data; and the push to encourage fairness in the even publication of clinical trials, such as the AllTrials campaign championed by Ben Goldacre. Yet I cannot help but see another impending ‘attack’ on qualitative research, for which it is harder to demonstrate transparency, in the scientific sense. US scholars are already fearsome of the challenges of what they see as the wave of neoliberal accountability that they feel is damaging the nature of qualitative scholarly practice (see the 2016 International Congress of Qualitative Inquiry.)

In the UK, research data sharing is light years ahead of any other country. We are fortunate that RCUK policies recognised data sharing as far back as the 1990s. The ESRC Policy established formally in 1995 has yielded some 1400 data collections available in the UK Data Archive, including some 900 qualitative or mixed methods datasets. That is an impressive amount of data and demonstrates how far we have come with fostering the idea of research data transparency in the UK. Some of the data are fully open while some and have access restrictions. I cannot point to any significant mass of such academically generated data available in the US, or indeed anywhere else in the world.

Further, multi millions have been invested by Jisc in research data management planning and enabling services. Ten years on, most UK universities have their own data repository, in various states of maturity. Many have good research data management support services helping their research staff prepare data management plans for Research Council bids. Through this engagement, many academic ethics committees also have an improved understanding of the value of data (vs. its risks and the need for protection through destruction of primary data). In Europe, the European Research Council and Horizon 2020 projects have data sharing mandates.

This rather well established data sharing culture, based on carrots and not sticks, is, however, not yet reflected across the pond. Despite NIH and NSF data sharing policies dating back to 1989, little research data is shared, other than the major surveys. The investigators behind the embryonic Syracuse Qualitative Data Repository (QDR), on which I have acted as a consultant since 2012, have helped further the DA-RT mission, and ensured that qualitative data gets onto the agenda too; a position I fully support.

Colin Elman and Diana Kapiszewski (of the Syracuse QDR) have provided some solid early thinking, contributing to the American Journal for Political Science (AJPS) forthcoming Guidelines for Providing Access to Data and Achieving Research Transparency for January 2016. However, QDR is still young, and as yet, very little data have been archived. Robust archiving procedures are in place, borrowing helpfully from the approach used by the UK Data Service, but looking to a paywall business model rather than a free-at-the-point-of-access model. However as the customer base is not yet established and with the US data sharing culture in this domain still nascent, I foresee a slow trickle of data into the archive (as was the case for the newly formed Qualidata – now QualiBank – back in the early 1990s). In the UK we had to work very hard on advocacy, fully engage in sharing debates, and most importantly showcase our VIP qualitative data depositors.

For me, the DA-RT guidance for the APSA journals represents a tidal wave of transparency requirements, even though adopting journals can select one of three levels of transparency standard for authors (milder to stronger). Publishers’ requirements are a stick that will shunt forward data sharing practices, more so than the softer, hard-to-enforce funder data policies. No data – no publish; a simple but efficient threat. Of course, many science journals such as Science, Nature and PLOS ONE already have data publication policies to enable validation of conclusions through reproduction of analysis. Data can sit as supplementary materials, in a public repository or in a journal’s own repository. Like other journals, AJPS has its own Dataverse, where supporting quantitative data and analysis syntax are deposited. In my opinion, some of the code I have viewed is exemplary in its rich annotation, and analysis can easily be re-run.

DA-RT proposes two types of transparency: production and analytical, and I see this is division as presenting the biggest problems for qualitative data.

What about replication of qualitative data?

DA-RT guidance so far on this is quite lightweight, and offers some useful principles, yet little practical guidance or exemplars are ready for the political scientist to view. As a data professional with 25 years in the data sharing industry, my own view is that production transparency is already very well done by many scholars in the UK; the UK Data Service has many exemplar datasets that showcase a range of data collection and processing transparency methods. While there is no one single approach, my colleagues and I have published widely on how to demonstrate data generation (Corti et al, Managing and Sharing Research Data: A Guide to Good Practice). Structured metadata, rich description, annotation of classifications in CAQDAS packages, and retrospective narrative or interviews are all useful devices for capturing the nuances of the study design, fieldwork and data preparation processes. Documentation can be provided at different levels of context study level – notes about the fieldwork situation for an interview event, and notes about broader social, cultural or economic context; or the data – a transcribed interview with annotations. In the political science arena, the Qualitative Election Study of Britain, 2010 is a well-documented survey by Kirsty Winters and colleagues, which we use in data management training for its clear examples of a consent and information sheet.

When it comes to analytic transparency, I’m less convinced of its value as a mandatory requirement of evidencing published claims based on qualitative analysis. The activity sits in the ‘evidence’ space between publishing a coherent block of data from a single research project and extracts that underpin claims made in a single published journal paper. One technique for evidencing claims through data that has been proposed is termed ‘Active Citation’, coined in 2009 by Andrew Moravcsik. This technique requires that every evidence-based claim, or contested claim, be supported by a citation, an excerpt of that source, annotation and, possibly a link to the original source. A transparency appendix is recommended. The QDR provides a pilot example and invites user to consider how micro-connections between data and a claim can be established and evaluated by other scholars.

My position is that it may be impossible to achieve links to every source referenced for much qualitative research – especially as we move into the interpretative spectrum. Furthermore, taking and publishing chunks of data out of context may destroy the integrity of, for example, an interview narrative. Does it provide the best evidence? What if the preceding and following paragraphs contradict the claim?

For me a middle ground is preferable and achievable, where we seek to share as much of the original data as possible and provide a rich and convincing narrative about claims. I also like the idea of inviting readers to view data directly, so that an excerpt can be viewed fully in its context. The UK Qualibank offers this functionality, where a paragraph can be cited and resolves back into the original transcript from the archived data collection (example of cited paragraph).

DA-RT is being discussed and embraced by many political science journals. The key challenge noted is in operationalising the DA-RT statement in absence of clear guidance. Does a journal always need to run in-house validation of analysis, e.g. of code? How do they handle qualitative data, who gets to peer review the supporting data, who gets to judge if it can be replicated sufficiently? I would argue that the UK Data Service has much to offer here, and would support the PSA in drawing up some practical guidance, based on real case studies of particular research approaches highlighting best practice and showcasing good archive datasets.

The PSA meeting also discussed whether the discipline needed better guidance on data sharing, and incorporating best practice in post-graduate courses was agreed to be beneficial; again, something that the UK Data Service can easily help with.

In summary I feel that extreme enforcement of analytic transparency may damage publishing of qualitative research, and that as it stands, especially if journals do decide to take a position that is overly penalising. The political science community would benefit from taking a positive supporting stance, encouraging data sharing through showing how it can be done well, and advocating the value of publishing one’s data as an output in its own right. This encouragement will likely mitigate against resistance, fear and confusion. I already observe senior academics in the UK who are scared of not meeting ESRC data policy requirements; the UK Data Service offers a supporting discussion to establish what and how data can realistically be shared.

I would hope that by walking this supporting path, we can help to re-gain trust in scholarly communication.

Data Impact blog

Research transparency: a welcome climb or a slippery slope?

Tags