The Hubble Image Similarity Project
Archives of astronomical images allow users to find images by metadata: what camera, what filter, what PI, what declination. Catalogs of the objects contained in those images provide a limited search of the data itself: position, magnitude, position angle, Sersic index. In a previous ADASS, we presented how we are harnessing neural networks to answer a much harder question: If I have a complex image, how can I find all the images in the archive that look like it? We found that by using convolutional neural networks trained on terrestrial images and dimensionality reduction techniques, we could find edge-on galaxies with dust lanes, star fields, and rich scenes of star formation.
A major hindrance to our work is the difficulty of comparing algorithms to determine which one is best. At its core the question is whether the groups of images identified by one algorithm are more similar than the groups from another algorithm. Our new Hubble Image Similarity Project aims to create a large database of similarity information between segments of Hubble images. The images are compared by humans in a citizen science project, where they are asked to select similar images from a comparison sample. We also designed the project for community impact: our citizen scientists are service-industry professionals from the local area near STScI in Baltimore who were impacted by the Covid-19 pandemic. They are paid a fair wage for their work through the Amazon Mechanical Turk system.
The comparison measurements are analyzed to compute a distance matrix between all the pairs of images, and that distance matrix can subsequently be utilized to assess the accuracy of algorithms based on computer vision methods. The image similarity matrix shows that the collective visual wisdom of our neighbors matches the accuracy of the trained eye, with even subtle differences among images faithfully reflected in the distances. We will publish the data and the images for use as a test set.