rta-dq-lib: a library to perform online data quality analysis of scientific data

The Cherenkov Telescope Array (CTA) is an initiative that is currently building the largest gamma-ray ground Observatory that ever existed. CTA will provide unprecedented sensitivity and angular resolution for the detection of gamma rays with energies above a few tens of GeV. A Science Alert Generation (SAG) system, part of the Array Control and Data Acquisition (ACADA) system, analyses online the telescope data – arriving at an event rate of tens of kHz – to detect transient gamma-ray events. The SAG also performs an online data quality analysis to assess the instruments’ health during the data acquisition: this analysis is crucial to confirm good detection. A Python and a C++ software library to perform the online data quality analysis of CTA data, called rta-dq-lib, has been proposed for CTA. The Python version is dedicated to the rapid prototyping of data quality use cases. The C++ version is optimized for maximum performance. The library allows the user to define, through some XML configuration files, the format of the input data and, for each data field, which quality checks must be performed and which types of aggregations and transformations must be applied. It internally translates the XML configuration into a direct acyclic computational graph that encodes the dependencies of the computational tasks to be performed. This model allows the library to easily take advantage of parallelization at the thread level and the overall flexibility allow us to develop generic data quality pipelines that could also be reused in other applications.

Theme – Data Processing Pipelines and Science-Ready Data