Skip to main content
eScholarship
Open Access Publications from the University of California

UC Berkeley

UC Berkeley Electronic Theses and Dissertations bannerUC Berkeley

Methods for Comparative Model Selection and Parameter Estimation in Diverse Modeling Applications

Abstract

Predictive accuracy of a model is of key importance in research and to a lay audience. Diverse modeling methods and parameter estimation methods exist, such that a wide range of techniques are available from which to select when approaching a modeling task. Given this, two questions naturally arise in relation to a modeling task: model selection and model parameter estimation. This dissertation is intended to advance the theory and practice of model selection and parameter estimation for the topics discussed here.

* In Chapter 2, I develop A3, a novel method for assessing predictive accuracy and enabling direct comparisons between competing models in an accessible framework. This method uses resampling techniques to "wrap" predictive modeling methods and estimate a standard set of error metrics for both the model as a whole and additionally for each explanatory variable utilized by the model. Two case studies in the chapter illustrate the applied utility of the method and how improved models may not only result in increased predictive accuracy, but also potentially alter inferences and conclusions about the effects of parameters in the model. An R package implementing the method is made available on CRAN.

* In Chapter 3, I develop ICE, a novel method of home range estimation. ICE uses a competitive method for estimating home ranges. Effectively, an estimator of estimators, ICE pits existing home range estimators against each other, each of which may be best suited for a given type of data. By selecting between different approaches, ICE can theoretically improve on the performance of any individual estimator across heterogeneous data sets.

* In Chapter 4, I develop Contingent Kernel Density Estimation, an extension to Kernel Density Estimation designed to account for the case when observations are measured with a specific form of error. Chapter 4 develops the method and derives contingent kernels for commonly-used kernels and sampling regimes. An application of the method is presented to data collected from the social networking site, Twitter, to estimate the national distribution of a sample of Twitter users.

* The study in Chapter 5 analyzes a large data set collected from Twitter. This study is based on data from over four million Twitter users and estimates parameters of this population with a primary focus on color preference choices made by these users. Novel results are found in this "big data" analysis approach that may not have been able to be identified with earlier, traditional approaches of sampling and surveying the behavior of individuals.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View