The evolution of the MeerLICHT and BlackGEM data pipelines

Astronomical observatories call for innovative solutions for the Big Data
streams that their new facilities generate. In general, one or more telescopes
observe large patches of the skies continuously with high resolution and
sensitivity. The most important common scientific drivers are to study
transient and variable astrophysical sources on several time scales and their
light curves, and to build catalogs of all the observed sources. The enormous
amounts of data are only valuable if they can be processed and stored
automatically and if all data can be searched quickly and visualised
interactively.

It is becoming more difficult and costly for research institutions to purchase
and configure hardware and network infrastructures that meet these needs in
advance. Moreover, managing running dynamic production databases, data products
and backups, software versioning, continuous development and integration,
monitoring, tracing and logging for these systems makes a move to a more
flexible cloud environment logical.

The MeerLICHT and BlackGEM optical array telescopes, designed, built and
operated by consortia of universities, are a good example where the back-end
software and hardware designs are an integrated part of the telescopes. In this
talk, we will touch the aspects described above and focus on the evolution of a
classical in-house data pipeline towards a scalable distributed cloud data
pipeline, where multiple database, pipeline, webserver and storage nodes
interoperate in real-time and batch mode. This allows us to process larger
datasets and search the growing source catalogs faster than was possible only a
few years ago.


Theme – Data Processing Pipelines and Science-Ready Data