Parallel Radio Astronomy Application Development with Dask and Numba
2020-11-12, 12:30–13:00, Times in UTC

Dask is a lightweight Python parallelisation and distribution framework that seamlessly integrates with the PyData ecosystem to provide a rich framework for developing Parallel and Distributed Radio Astronomy Applications.

In this demo, we will demonstrate how to create a multi-core radio astronomy application using a combination of Dask and Numba.

Time permitting, we aim to cover the following topics:

Dask

  • General compute graph concepts.
  • The Dask Array abstraction and its relation to NumPy arrays
  • Dask Array chunk size effects on application memory and performance.

dask-ms

  • Exposing CASA Tables as xarray Datasets and Table Columns as Dask Arrays
  • Choosing appropriate chunking strategies
  • Updating and writing CASA Tables from Dask Arrays

Numba

  • JIT-compiling Python code to speeds similar to the equivalent C/Fortran code.
  • The relation between performance, input size and dropping the Python Global Interpreter Lock.
  • Wrapping accelerated Numba code in Dask for multi-core performance.

Dask Distributed

  • Scaling a Dask application up to a High Performance Computing cluster.
  • Management of the cluster with dask_jobqueue.
  • Annotation and manual scheduling of specific tasks for optimal placement and performance.

Theme – Data Processing Pipelines and Science-Ready Data