Crystalball, a Dask and Numba accelerated DFT Model Predict

Producing an astronomical image from the data generated by a Radio Interferometer involves processing the observed visibilities, which are obtained by measuring complex voltages between antenna pairs. As part of this process, it is often necessary to build a model of the sky area observed with the Interferometer, and to turn that sky model into a set of model visibilities. The latter step, commonly referred to as prediction of the model visibilities, is computationally expensive. The model visibilities are generated by the application of the Radio Interferometry Measurement Equation to the sky model, typically represented either as an intensity image or a list of discrete model components. In the former case, faster but less accurate convolutional degridding is applied, while a slower but more accurate Direct Fourier Transform (DFT) is applied in the latter.

The accuracy of the DFT method is required in a number of science cases. An important one is that of spectral-line interferometry. In this case, a continuum sky model must be subtracted from the observed visibilities before making spectral-line images. In practice, frequency-averaged data are used to produce both a continuum image and a component sky model where each component has its own spectral shape. The DFT method takes each component’s spectral shape into account when predicting the model visibilities, allowing one to properly subtract the continuum and thus obtain high-quality spectral-line data.

Here we present Crystalball, an implementation of the DFT model prediction currently used as part of the CARACal radio interferometry pipeline. In this pipeline, the continuum image and sky model are obtained with WSClean. Crystalball performs a DFT Predict of a Component Model, using the Python dask and numba packages to accelerate execution. Dask is a parallel processing framework based on computational graphs, while numba jit-compiles a subset of Python code to machine code with performance comparable to C or Fortran. Dask-ms exposes Measurement Set columns as dask Arrays, while the numba code, implemented in the codex-africanus package, is applied to individual chunks of these arrays.

Through the use of these technologies, the expensive DFT has been made tractable on contemporary datasets produced by the MeerKAT telescope. For example, a Measurement Set of 5M rows, 1,000 channels and 2 correlations takes 24 h to predict 10,000 components on an Intel Xeon 2.6 GHz processor with 32 CPUs/threads. All CPU cores are fully exercised during the duration of the prediction step, with modest and tunable usage of RAM. Crystalball has been useful in producing spectral-line data products from recent MeerKAT data (e.g., Serra et al. 2019, A&A, 628A, 122).

While the DFT has been made tractable, Crystalball still consumes a large portion of a CARACal pipeline run. As CARACal currently does not support GPUs, GPU-accelerated prediction as provided by packages such as Montblanc are not feasible. Future work would involve distributing the prediction over a compute cluster or writing a GPU predict should the CARACal pipeline support this processor type in the future.

Theme – Data Processing Pipelines and Science-Ready Data