David Shupe is a scientist and infrared astronomer at the NASA/IPAC Infrared Science Archive (IRSA). He has been a member of the Caltech/IPAC staff since 1995, working on Spitzer, Herschel, Rubin/LSST, and the Zwicky Transient Facility. David is a core team member of the Astropy Project and has a strong interest in the Scientific Python ecosystem, serving on the organizing committee of the SciPy 2019 and SciPy 2020 conferences,
Working with large catalogs using the Astronomy eXtensions for Spark (AXS) framework
The Astronomy eXtensions for Spark (AXS) framework has been developed by a team led by the University of Washington's DIRAC Institute. AXS enables efficient cross-matching of large catalogs, and allows astronomers to make queries and run arbitrary processing using the Apache Spark big-data engine. As part of a collaboration with UW-DIRAC, a group at the NASA/IPAC Infrared Science Archive (IRSA) has been running AXS on our internal cluster to gain operational experience with this framework, with an eye towards possible future deployment in an archival science platform or as a service. This talk will cover our AXS processing of the 114-billion-row NEOWISE-R Single Exposure (L1b) Source Table into a prototype catalog of light curves stored in Parquet format, and how these files can be accessed outside the AXS framework using popular Python packages such as Pandas and Dask.