Sergio is a Research Engineer in the SIH Data Science Team. He obtained a PhD in Mechanical Engineering with RMIT University in Melbourne, and worked in renewable energy for the past 6 years at the CSIRO Energy Centre in Newcastle and then with SwitchDin. His skills range from thermal modelling and simulation to big data processing and pipelines in production environments.
His areas of expertise are:
- thermodynamics modelling, control and thermal systems dynamical simulations
- Data modelling and analytics
- Data visualization
- IoT data pipelines
- Data engineering and databases
- Python (coding), Linux (OS), SQL (database), RabbitMQ (data streaming), Ansible (DevOps)
A scalable transient detection pipeline for the Australian SKA Pathfinder VAST survey
The ASKAP Survey for Variables and Slow Transients (VAST) is the study of astrophysical transient and variable phenomena at radio wavelengths, such as flare stars and supernovae, using the new Australian Square Kilometre Array Pathfinder (ASKAP) telescope. The large field of view of ASKAP means that large areas of the radio sky can be surveyed regularly at sub-millijansky sensitivities. This has not been possible with previous radio telescopes, and means that ASKAP is now providing an unprecedented view of the dynamic radio sky. For example, the first shallow all-sky survey completed by ASKAP provided 2.5 million source measurements. This creates a data challenge for VAST, where regular epochs of the sky will result in the need to construct lightcurves for millions of astrophysical sources, while also being able to swiftly identify the small percentage of sources that exhibit transient behaviour, in addition to providing a visualisation solution for such a large and rich dataset.
We have developed a modern and scalable code base written in Python that builds upon previous software efforts in the community that, due to their old technology stacks, were not scalable to ASKAP datasets. The modern technology stack allows us to perform fast mass source association using Pandas dataframes and well-known Astropy crossmatch functions. The transient detection code and the web interface have been unified under one code base, in which the user can run a pipeline job from both command line and web interface itself.
A database has been used to store the data model, the relationship between entities and to serve data to the web interface. We used Dask, a tool for scalable analytics in Python, as the tool for ensuring both vertical as well as horizontal scalability of the pipeline. The technology stack is: Python 3.6+, Postgres 10+, Astropy 4+, Django 3+, Dask 2+ and Bootstrap 4. The adoption of such a modern technology stack will ensure a long life expectancy of this pipeline.
In this talk we will give an overview of the pipeline architecture and implementation, discuss some initial results, and outline future challenges.