Morphological classification of astronomical images with limited labelling

The task of morphological classification is complex for simple parameterization, but important for research in the galaxy evolution field. Future galaxy surveys (e.g. EUCLID) will collect data about more than a billion galaxies. To obtain morphological information one needs to involve people to mark up galaxy images, which requires either a considerable amount of money (in case of MTurk or Toloka commercial engines) or a huge number of volunteers (in case of citizen science projects like Galaxy Zoo). It’s worth nothing that any manual markup at some point becomes ineffective, as it does not scale well for the increasing amount of data.

Today, hybrid approaches (using both machine learning and human markup) make it possible to speed up the galaxy morphology annotation process (e.g. Beck et al. (2018) - SOTA model), but considerable efforts of humans are constantly needed to provide high accuracy of morphological classification. We search for fast and accurate machine classification methods with restricted usage of human markup resources.

We propose an effective semi-supervised approach for galaxy morphology classification task, based on active learning of adversarial autoencoder DNN model. For a binary classification problem (top level question of Galaxy Zoo 2 decision tree) we achieved accuracy 93.1% on the test part with only 0.86 millions markup actions, this model can easily scale up on any number of images. Our best model with additional markup achieves accuracy of 96.2%.

Theme – Machine Learning, Statistics, and Algorithms, Citizen Science Projects in Astronomy