Orchestration of Dockerized Data Reduction Pipelines from a RESTful Web Service

Data reduction pipelines are traditionally run on a researcher's personal computer on a small amount of data. The pipeline may have complex software dependencies that preclude the researcher from installing it on a faster server. Even if the pipeline runs on the server, its deployment in a multiprocessing environment may be problematic. Here we present a modern solution to allow for the on demand reduction of the ever-increasing volume of data stored in telescope archives. We have developed a Python web service that accepts 2dF-AAOmega observations and determines the steps needed to reduce the data. Each step runs 2dfdr commands from a Docker container on a fast server. We utilise docker-py to remotely execute these commands from within Celery tasks, allowing for a robust, configurable data reduction workflow to be assembled and executed asynchronously by Celery across several processors. Data Central plans to offer the service to users when requesting data from the newly revamped AAT archive, allowing effortless access to freshly reduced data. The service is extensible to other pipelines and would form a solid basis for developing IVOA SODA services, while slight modifications could unlock quick turnaround reductions of transient triggered observations.


Theme – Time-Domain Ecosystem, Data Processing Pipelines and Science-Ready Data, Data Interoperability