Working with large catalogs using the Astronomy eXtensions for Spark (AXS) framework

The Astronomy eXtensions for Spark (AXS) framework has been developed by a team led by the University of Washington's DIRAC Institute. AXS enables efficient cross-matching of large catalogs, and allows astronomers to make queries and run arbitrary processing using the Apache Spark big-data engine. As part of a collaboration with UW-DIRAC, a group at the NASA/IPAC Infrared Science Archive (IRSA) has been running AXS on our internal cluster to gain operational experience with this framework, with an eye towards possible future deployment in an archival science platform or as a service. This talk will cover our AXS processing of the 114-billion-row NEOWISE-R Single Exposure (L1b) Source Table into a prototype catalog of light curves stored in Parquet format, and how these files can be accessed outside the AXS framework using popular Python packages such as Pandas and Dask.


Theme – Science Platforms and Data Lakes, Time-Domain Ecosystem, Data Processing Pipelines and Science-Ready Data