Vicente is System Engineer and Data Science Principal at ESA’s European Space Astronomy Centre in Madrid (Spain) where he is responsible for development and operations of Science Ground Segment Systems.
Vicente leads System Engineering activities for the definition, implementation and operations of ESA Datalabs, which aims at consolidating a reference platform for scientific analysis of multi mission information.
Previously at ESA, Vicente has been in charge of development and operations activities of Europe’s Space Situational Awareness (SSA) Precursor Services as well as Ground Segment Systems at ESA’s European Space Operations Centre in Darmstadt (Germany) for missions like Integral, Rosetta, Mars Express, XMM-2000, Cryosat ...
Before joining ESA, Vicente has been chief technologist of large-scale software systems in government and automotive sectors.
Vicente holds an MS in Computer Engineering complemented with on-going PhD studies in the area of Intelligent Agents and Global Navigation Satellite Systems.
ESA Datalabs: an e-Science Platform for Data Exploitation and Preservation at ESAC
Since the appearance of e-science as a tangible approach – a data-intensive approach to science, geared towards discovery – astronomy has been arguably the most successful example: it is a perfect fit for a data-intensive approach since most data is public and free of privacy concerns or commercial value. Also, and more recently, we have entered what could be called the golden age of surveys, with several large-scale projects, spanning decades, between finished, ongoing, and planned activities. ESA is responsible, or is a major partner, in several of these initiatives.
This change is profound and data has become the major technological challenge. Increases by multiple orders of magnitude in dataset size means that transferring data to a scientist is often unfeasible. But size is only one of the aspects in a data-intensive domain. There are layers of ingestion, curation and analysis happening in parallel and across many communities. Preservation is vital and is, in general, a largely unsolved problem, both in the technology side and in public policy. Finally, curation and analysis also create new challenges that intersect with a push for open science.
We present the current status in the development of the ESA Datalabs platform. This system allows users to bring their code to ESA’s infrastructure and have direct access to ESA’s archives. Datalabs are full computational environments and our catalogue of Datalabs ranges from new tools that have become de-facto standard for analysis, to complex legacy systems repackaged to run via a web browser. ESA Datalabs underlying architecture is domain agnostic; it fosters research and innovation through the integration of transversal access to big data, containerised applications, notebook technologies and domain specific software. For example, customised JupyterLab environments are readily available for astronomers, scientists in Earth Observation related fields, or researchers in global navigation. Moreover, ESA Datalabs support for development environments such as Octave, or reference tools such as TopCat in astronomy enable reusability of existing code baselines.
We will discuss the challenges faced in developing a multi-domain exploitation platform capable of fulfilling user requirements that vary from execution of simple notebooks, to machine learning algorithms, to science pipelines. Finally, we will show functionalities already available for users as well as the future evolution plan.