Studying Quasar Spectra with Machine Learning in Sloan Digital Sky Survey
Skip to main content
eScholarship
Open Access Publications from the University of California

UC Riverside

UC Riverside Electronic Theses and Dissertations bannerUC Riverside

Studying Quasar Spectra with Machine Learning in Sloan Digital Sky Survey

Creative Commons 'BY' version 4.0 license
Abstract

In this thesis, we designed an algorithm to provide robust selection criteria in the parameterspace of measured properties of quasars. Our method combines the prior knowledge of an expert observer with what unsupervised machine learning understands about the underlying structures in the data to get a data-driven boundary in the multi-dimensional parameter space of quasar physical properties. We did that by quantifying the dissimilarity of our target group to the majority of the quasars in our data set. Our versatile method can select a cluster of similar data points that are located in statistically significant lower-density regions of the parameter space. We could find more quasars in the class of extremely red quasars and show our new sample has even more exotic outflow behavior. Our final selection produces three times more quasars with visually verified CIV broad absorption line feature, which is the signature of outflow, than the previous extremely red quasar sample. Our method is very useful in selecting the most important follow-up targets for observing red quasars. In the second project, we could assemble the largest CIV absorption line catalogue to date. By providing a probability for the existence of absorption systems in a quasar spectrum that viis a by-product of our Bayesian model selection and Gaussian Processes methods, we removed the need for visual inspection which is essential in dealing with the upcoming surveys with millions of spectra. After carefully validating our method by comparing a subset of the spectra inspected in the largest visually inspected CIV catalog to what our method predicts, we could find 113,775 CIV absorption systems with at least 95% confidence among 185,425 selected quasar spectra from SDSS DR12. We obtain a posterior distribution for column density, velocity dispersion, and absorption redshift for each investigated spectrum which can be used to get the maximum a posteriori value and the credible interval. Our method is specifically useful when we want to obtain information from low signal-to-noise ratio data.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View