Tracy Chen joined Caltech/IPAC as a postdoc in 2009, and has been working as a scientific analyst within the NED team since 2012.
Classification of Astrophysics Journal Articles with Machine Learning
The NASA/IPAC Extragalactic Database (NED) routinely reviews journal articles to extract fundamental data of extragalactic objects from the articles and join them across the spectrum into the database. The work of manually going through the journal articles, identifying if one is appropriate for inclusion in NED, and what kind of data are in the articles, is very labor intensive, especially given the ever-increasing numbers of publications each year. We present here a machine learning approach developed recently to help with the classifications of journal articles topics and content. We show that the application of this machine learning approach can reproduce the hand-classifications to an accuracy of over 90%.