Machine-Assisted Discovery Through Identification and Explanation of Anomalies in Astronomical Surveys
Data volumes in modern astronomical surveys are large, and human attention is comparatively scarce. The most interesting sources are rare and may therefore go permanently buried and unknown in large archives. Moreover, the science results that currently planned surveys propose to deliver (e.g., precise constraints on the physical properties of dark matter and dark energy from Roman, SPHEREx, and Euclid) require exquisitely precise control of systematic errors in measurements taken over billions of individual galaxies and stars. Validating these measurements in current surveys is already a massive undertaking, and existing techniques appear unlikely to scale the next generation of large sky surveys.
We are developing algorithms to enable discovery and validation processes to scale to large survey data sets. The key innovation is the use of machine learning to identify, group, and explain anomalies within very large data sets. The goal is to quickly distinguish erroneous measurements and expected patterns in the data from sources and statistical correlations with true astrophysical origins. We illustrate the process of identifying and explaining anomalies in a study conducted on sources observed by the Dark Energy Survey. In comparing outliers identified by this process on a subset of the sources (11M) to those that were independently flagged by years of manual review, we found that 96% of the automatically identified outliers were also discarded by humans. The remaining outliers exhibit some additional issues not previously identified by the team, as well as several unusual objects that led to follow-up spectral observations with the Palomar Observatory. This approach holds great promise for reducing the amount of manual effort involved in validating large survey data sets and also increasing the rate at which new discoveries are made.