Transfer Learning in Large Spectroscopic Surveys

Current multimillion spectroscopic surveys of SDSS and LAMOST provide enough data for analysis with the most advanced machine learning methods as are convolutional neural networks (ConvNets). However, each survey has a different strategy of target selection, based on individual scientific goals. Diverse target selection results in different statistical distributions of each class of celestial objects in the archive. Therefore, it is not straightforward to reuse a machine-learning algorithm trained on one survey to another due to statistical properties. One of the possible solutions is transfer learning.

Transfer learning is a method which reuses previously gained knowledge in learning a new problem. We present an application of transfer learning in the context of deep learning on the classification of quasars (QSOs), trying to identify unknown QSOs in SDSS archive using the ConvNet model trained on spectra from LAMOST and finally retrained on SDSS spectra.

We used LAMOST DR5 v3 archive (more than 9 million spectra) and the LAMOST catalogue of all QSOs to discover QSOs unknown to SDSS DR14 QSO Catalog (DR14Q) which corresponds to SDSS DR14 (more than 4 million spectra). For the classification, we employed VGG Net-A ConvNet adapted to one-dimensional spectra.

Applying the ConvNet based on parameters transferred from LAMOST training to our test set of 100 thousand spectra randomly drawn from SDSS resulted in 84330 predicted not-QSOs, 14277 correctly predicted QSOs and 170 missed QSOs. There were also 1223 spectra predicted as QSOs but not identified so in SDSS DR14Q. Our inspection has confirmed a number of the 1223 spectra to be QSOs unknown to SDSS DR14Q. Extrapolating the inspection to 49729 false positives spectra from the whole SDSS DR14 suggest that our ConvNet can discover QSOs not listed in SDSS DR14Q. We further discuss our results and give examples of discoveries.

Theme – Machine Learning, Statistics, and Algorithms