Bart Scheers is an astronomer whose expertise lies in the overlapping fields of computational astronomy and database technology. Scheers obatined his PhD in astronomy from the University of Amsterdam in 2011. As a researcher he worked for more than a decade on the many aspects of big data management for astronomical applications at the Univerisity of Amsterdam and CWI, the Dutch national research institute for mathematics and informatics.
Scheers is the co-founder of Dataspex, a young company started mid 2018 as spin-off from the Database Architectures Group of the Dutch national research institute CWI. Dataspex delivers advanced solutions for big data management and analysis for scientific applications, especially in the field of astronomy.
The evolution of the MeerLICHT and BlackGEM data pipelines
Astronomical observatories call for innovative solutions for the Big Data
streams that their new facilities generate. In general, one or more telescopes
observe large patches of the skies continuously with high resolution and
sensitivity. The most important common scientific drivers are to study
transient and variable astrophysical sources on several time scales and their
light curves, and to build catalogs of all the observed sources. The enormous
amounts of data are only valuable if they can be processed and stored
automatically and if all data can be searched quickly and visualised
It is becoming more difficult and costly for research institutions to purchase
and configure hardware and network infrastructures that meet these needs in
advance. Moreover, managing running dynamic production databases, data products
and backups, software versioning, continuous development and integration,
monitoring, tracing and logging for these systems makes a move to a more
flexible cloud environment logical.
The MeerLICHT and BlackGEM optical array telescopes, designed, built and
operated by consortia of universities, are a good example where the back-end
software and hardware designs are an integrated part of the telescopes. In this
talk, we will touch the aspects described above and focus on the evolution of a
classical in-house data pipeline towards a scalable distributed cloud data
pipeline, where multiple database, pipeline, webserver and storage nodes
interoperate in real-time and batch mode. This allows us to process larger
datasets and search the growing source catalogs faster than was possible only a
few years ago.