The technical challenges of implementing DOIs at the ESA Science Data Center (ESDC)

Scientific results from both observations and simulations require verification to be accepted by the scientific community. Therefore, scientists more and more enrich their papers by the data sets on which the results are based. As a matter of fact, American Geophysical Union journals (e.g. JGR) reject papers if the data are not made available to the referee.

The datasets are made available online via links, which only provide useful information if they adhere to the FAIR principle: Findable, Accessible, Interoperable and Reusable. Apart from that, the links need to be permanent which is often hard to achieve. A very common way for making these datasets available, is through Digital Object Identifiers (DOIs), which have been widely adopted by publishers and researchers. DOIs also are useful for data use tracking, which helps researchers secure grants and project extensions.

A DOI actually is more than just a link between a publication and a dataset. They also contain a landing page making available metadata for the dataset. This metadata both describes the dataset and facilitates discovery via the Google Dataset Search engine. The Google Dataset Search engine was launched in late January 2020 and already indexes over 25 million datasets. This fast growing service may become the de-facto standard for searching for datasets and new datasets can be made discoverable by adding a JSON script in the DOI landing pages.

This presentation focuses on the technical aspects of the implementation of DOIs at the ESDC, the ESAC Science Data Center. The ESDC is in the process of generating DOIs for all space missions and a technical infrastructure is being created to ease the generation of DOIs both for future missions and publications. Given the great number of datasets involved, full automation of DOI generation, including landing pages, metadata and JSON scripts, is on its way at ESDC.

Authors: M. H. Sarmiento, G. de Marchi, B. Merín, D. Baines, S. Besse, B. Martinez, A. Masson and D. Wenzel.

Theme – Data Processing Pipelines and Science-Ready Data, Other