FAIR data assessment tool

Emily Thomas, Trainee Data Reviewer at the Data Archiving and Networked Services (DANS), outlines DANS’s development of a data assessment tool based on the FAIR principles.

The FAIR principles

Following the workshop ‘Designing a data FAIRport’ which took place in the Netherlands in 2014, a diverse group of stakeholders formulated a minimal set of community-agreed guiding principles on scientific data publication and reuse.

These principles are now known as the FAIR principles, and act as a guide to data publishers and stewards rather than being a standard or specification. In simpler words, the FAIR principles provide a set of mileposts for data producers and publishers to help ensure that all data are:

Findable – defined by a persistent identifier and detailed metadata

Accessible – well-defined license and access conditions

Interoperable – ready to be combined with other data by humans and machines: standardised formats and vocabulary

Reusable – ready to be reused in future research and processed using computational methods

There appear to be remarkable similarities between the principles proposed in the Data Seal of Approval (DSA) devised by DANS in 2006 and these newly formed FAIR principles. The initial idea behind the DSA was that a set of five principles could be used to ensure a standard of quality for digital repositories. While the DSA is known to focus on the responsibilities of data quality of whole repositories, the FAIR principles alternatively hold emphasis on quality within individual datasets.

Review of the FAIR principles

Although the similarity of criteria within the DSA and FAIR demonstrate consistency of standards of quality, we identified some complications with defining the terms of each principle independently in the FAIR principles. In other words, the outline of principles seemed to us ambiguous and revealed some dependencies on other principles. For example, in the image above the term ‘accessible’ is included under both the Interoperable and Reusable principles.

This was problematic because, our interest in the principles comes from an assessment point of view as our goal is to implement the principles into an assessment tool (for rating the quality of datasets). Thus, it would be almost impossible to use dependent principles in this type of implementation because it means a dataset cannot be rated on each principle independently, leading to a messy outcome and perhaps inaccurate interpretations of the overall dataset review.

We have therefore started working on a new operationalisation of the FAIR principles in a way which allows the principles to be described independently, meaning that a rating of a dataset under one principle is entirely independent from its scores under the other principles.

Development of a data assessment tool

The aim of this project is to implement the FAIR principles as a basis for a data assessment tool named FAIRdat so that every dataset which is deposited or reused from any data repository can be assessed for its score on the principles Findable, Accessible, Interoperable and overall ‘FAIRness’ on a scale of 5 stars. A couple of badge schemes for open data quality assessment already exist and enable reviewers to evaluate datasets based on certain indicators of quality. For example, the Open Data Certificate holds four badge levels (bronze, silver, gold and platinum) based on a set of criteria including legal, practical, technical and social requirements of a dataset, and Tim Burners-Lee’s employs a 5 star deployment scheme for open data quality. The development of our tool has been influenced by these schemes but makes use of star badges to represent the quality of a dataset based on each FAIR principle.

The FAIRdat tool will allow the assessment of a dataset and will associate a FAIR profile to each dataset in the FAIRdat database and/or in its location (i.e. repository). We aim to initially implement the tool into repositories at DANS (EASY and DataverseNL) and then to expand this externally so that any dataset in any location can be reviewed via the independent tool. This assessment tool is aimed for people, including data producers, re-users and archivists, who want to provide a quality assessment of a dataset.

We have outlined 5 criterion levels for the principles Findable, Accessible and Interoperable, and each criterion level represents a star level in the FAIR profile. For this assessment we have decided to use the original ‘R’ as an average of the scores of the three preceding principles, resulting in an overall ‘FAIRness’ or ‘Reusability’ score. We found that criteria classified under Reuse in other FAIR implementations could without problem be accommodated under F, A or I. Moreover, we feel that scoring highly on the F, A and I principles in turn makes the Reusable nature of the data higher; this is also consistent with the updated FAIR principles by Force11 in which it is stated under ‘Reusable’ that the “Data objects should be compliant with principles 1-3”. So for example, the FAIR profile for a dataset which is, well findable, accessible with some restrictions, and with low interoperability may result in moderate reusability and would be displayed as follows:

Additionally, we aim to create a FAIRdat website which enables anyone using data anywhere to assess their dataset(s) using the tool. The website will follow a similar structure to the Data Seal of Approval (DSA) website, which provides information to users about how to use the tool, the essential documentation for scoring datasets and information about the community and repositories that implement the tool. The website will ideally feature a catalogue of all datasets which have a FAIRdat profile and would be a way to promote usage of the FAIRdat assessment tool internationally.

Current implementation progress

A pilot version of the FAIR assessment tool has been developed using SurveyMonkey and is currently being tested internally at DANS. The tool runs a series of questions (usually only maximum of 5 per principle) which follow routing options to display the star rating scored per principle. At the end of the assessment, the tool will display the star score of each principle and will also calculate and display the overall ‘R’ FAIRness score.

Some examples of the pilot survey tool we have created can be seen in screenshots of the tool displayed below:

We have already started creating a mockup of the FAIRdat website. The mockup includes web pages for the homepage, background information, assessment tool, community, news and events, register and login, and for a FAIRdat database. The design of the mockup is minimal and used to portray how the tool can function. We aim to collate as much user feedback as possible which we will aim to incorporate in the development of the end product. If you have any feedback or questions please get in touch.

Some screenshots of the website mockup can be seen below:

Data Impact blog

Tags