2020-11-12, 12:30–13:00, Times in UTC
Dask is a lightweight Python parallelisation and distribution framework that seamlessly integrates with the PyData ecosystem to provide a rich framework for developing Parallel and Distributed Radio Astronomy Applications.
In this demo, we will demonstrate how to create a multi-core radio astronomy application using a combination of Dask and Numba.
Time permitting, we aim to cover the following topics:
Dask
- General compute graph concepts.
- The Dask Array abstraction and its relation to NumPy arrays
- Dask Array chunk size effects on application memory and performance.
dask-ms
- Exposing CASA Tables as xarray Datasets and Table Columns as Dask Arrays
- Choosing appropriate chunking strategies
- Updating and writing CASA Tables from Dask Arrays
Numba
- JIT-compiling Python code to speeds similar to the equivalent C/Fortran code.
- The relation between performance, input size and dropping the Python Global Interpreter Lock.
- Wrapping accelerated Numba code in Dask for multi-core performance.
Dask Distributed
- Scaling a Dask application up to a High Performance Computing cluster.
- Management of the cluster with dask_jobqueue.
- Annotation and manual scheduling of specific tasks for optimal placement and performance.