Prediction Methods for Astronomical Data Observed with Measurement Error
- Author(s): Long, James Patrick
- Advisor(s): Rice, John A
- El Karoui, Noureddine
- et al.
We study prediction when features are observed with measurement error. The research is motivated by classification challenges in astronomy.
In Chapter 1 we introduce the periodic variable star classification problem. Periodic variable stars are periodic functions which belong to a particular physical class. These functions are often sparsely sampled, which introduces measurement error when attempting to estimate period, amplitude, and other function features. We discuss how measurement error can impact performance of periodic variable star classifiers. We introduce two general strategies, noisification and denoisification, for addressing measurement error in prediction problems.
In Chapter 2 we study density estimation with Berkson error. In this problem, one observes a sample from the density $f_X$ and seeks to estimate $f_Y$, the convolution of $f_X$ with a known error distribution. We derive asymptotic results for the behavior of the mean integrated squared error for kernel density estimates of $f_Y$. The presence of error generally increases convergence rates of estimators and optimal smoothing parameters. We briefly discuss some potential applications for this work, including classification tasks involving measurement error.
In Chapter 3 we study prediction of a continuous response for an observation with measurement error in its features. Using Nadaraya Watson type estimators we derive limit theorems for convergence of the mean squared error as a function of the smoothing parameters.
In Chapter 4 we study the effects of measurement error on classifier performance using data from the Optical Gravitational Lensing Experiment (OGLE) and the Hipparcos satellite. We illustrate some challenges in constructing statistical classifiers when the training data is collected by one astronomical survey and the unlabeled data is collected by a different survey. We use noisification to construct classifiers that are robust to some sources of measurement error and training--unlabeled data set differences.