A Comparison of Estimators for Respondent-driven Sampling
- Author(s): Lange, Margaret Meek
- Advisor(s): Handcock, Mark
- et al.
Respondent-driven sampling, or RDS, is used to draw samples from hard-to-reach or marginalized populations and to make inferences about the populations based on the samples. Such sampling begins with an initial, or "seed,"' sample from the population of interest. It then exploits the networked structure of those populations, relying on the members themselves to recruit further members of the population for the sample. A number of estimators for RDS have already been developed. Each estimator is motivated by a model that makes a number of assumptions about seed selection, respondent behavior, and certain properties of the underlying social network itself. In a series of articles, Gile and Handcock have used the statnet package in R to simulate respondent-driven sampling under a variety of conditions.They then use these simulations to examine the bias and variance of different estimators when assumptions are not perfectly or at all fulfilled.
The goal of this project is twofold. First, the original R code to simulate respondent-driven sampling is quite slow. In the statnet package RDSdevelopment, we have written C code to duplicate and extend the functionality of the R code. The new code is considerably faster than the old R code. Second, we examine the sensitivity of current estimators to a previously unstudied aspect of respondent behavior: how accurately respondents report their number of contacts in the network, also known as their "degree." The two networks we consider are a previously simulated network, fauxmadrona, and the Project 90 network. The latter is constructed from an actual population of heterosexuals at high-risk for HIV living in Colorado Springs, CO, in the late 1980s and early 1990s. We carry out simulations on both networks using statnet in which we simulate imperfect recall of degree. Under conditions of imperfect recall, estimators tend to increase in variance as recall erodes. There is also change in bias, though the direction of change varies from estimator to estimator and is not monotonic with the increase in recall error.
Finally, we introduce a variation on the current estimators by replacing reported degree by reported rank of degree in each estimator formula. Rank of degree is calculated by a carrying out a rank transformation on the reported degree distribution of each sample of individuals from the networked population. A rank transformation of a set of numbers maps each member of a set onto its rank with respect to the other members of the set. A number of rank transformations are possible. The rank transformation known as "standard competition" degrades estimator performance on the fauxmadrona and the Project 90 network. The dense rank transformation leaves estimator performance mostly unchanged, at least on the networks treated by the thesis.