Population Size Estimation using Multiple Respondent-Driven Samples
- Author(s): Kim, Brian
- Advisor(s): Handcock, Mark S
- et al.
Respondent-driven sampling (RDS) is commonly used to sample from hard-to-reach populations, such as female sex workers or people who inject drugs, since traditional methods are unable to efficiently survey members due to the typically highly stigmatized nature of the population. Estimating the size of these populations is often desirable so that organizations such as the Centers for Disease Control and Prevention (CDC) can provide the proper amount of aid. However, due to the nature of RDS, traditional methods of population size estimation do not provide good estimates. Therefore, in order to effectively utilize the data we can collect from these populations, we must develop new population size methods designed specifically for RDS data.
In this dissertation, I first explore some of the assumption in RDS using a data set of female sex workers in Kampala, Uganda. This is an exploratory look at the recruitment based on geographical location, with a focus on determining whether the RDS is able to actually reach all areas of the map instead of getting stuck in a certain region, as well as looking at the effects of age on recruitment. Then, I introduce a new method of estimating population size that uses concepts from capture-recapture methods while modeling the RDS as a successive sampling process. This extends a current method of population size estimation based on RDS called Successive Sampling for Population Size Estimation (SS-PSE) to include more than one sample, incorporating the information from a capture-recapture design. I develop the Bayesian framework for the model including posterior sampling with a Markov chain Monte Carlo algorithm. Then, using simulation studies, I compare my method with various existing methods in the literature, as well as assessing the frequentist properties of my method.