In the year 2000, Gerber and Green published the results of a field experiment which examined the impact of electoral campaigns on voter participation. Since this landmark work, more than one hundred similar studies have appeared in the political science literature. These randomized controlled trials are usually conducted within get-out-the-vote (GOTV) drives seeking to increase voter turnout. The surge in GOTV experiments was partly due to a statistical innovation that preceded Gerber and Green’s publication, the average treatment effect for the treated (ATT), which allowed researchers to compare directly those treated to a similar group that was assigned to the control.
In this research we focus on settings common to many social science field experiments such as those of GOTV studies, where participants may comply, or not, with the treatment protocol assigned by the experimenter. For experiments with binary outcomes, we show that each individual in the study may be classified as one of a finite number of distinct types. We call these behavioral types because they characterize the individual’s complete reaction, their measured response and how they receive treatment, to the assignment of each possible experimental group. In this context, the data is generated by randomly allocating these various behavioral types to the different levels of treatment. Thus, the model is parameterized by the unknown proportions of the different behavioral types so that many statistical aspects of the experiment, such as commonly studied average treatment effects, may be written as a function of these proportions.
Viewing the data as generated by these behavioral types changes the analysis of the experiment in two ways. First, it changes the perspective on what is being estimated. Instead of finding a particular treatment effect, the ultimate goal can be seen as estimating proportions of behavioral types. With this frame of reference, the effect of a certain treatment will be most accurately represented as the fraction of the experimental sample for which the treatment has an effect. Second, by clarifying the underlying data generating process, a behavioral-types approach directs the resulting statistical analysis.
We use a well cited example to introduce behavioral types before providing formal definitions. We present the ATT as a case study for how to apply a behavioral-types approach for a design known to many social science researchers. The understanding of the data generating process allows us to evaluate the bias and variance of the ATT estimator, and we show the variance depends on the choice of the sampling assumptions. We then provide rigorous definitions of a behavioral type and of restrictions which reduce the number of behavioral types in a population to a number where the proportions of each type may be estimated. We present three experimental designs and present a strategy to identify the proportions of each type and elucidate how treatment effects may be found from the proportions.
A behavioral-types approach is well suited to multi-treatment experiments because it distills often complex designs into an estimation problem of a manageable number of types. We apply the behavioral types approach to four published social science field experiments involving multiple levels of ordered treatment. For each, we show how the interpretations and the statistical analyses differ with a behavioral-types approach, and can lead to different conclusions. Through the applications we illustrate how behavioral types provides insight into a range of experimental designs, such as those with spillover effects or partial ordering of treatment levels.
For two of the four applications we further examine the issue of joint significance by constructing multi-dimensional confidence regions for the proportion of behavioral types. We find that normal approximation methods perform poorly, but the shortcomings can be corrected by the bootstrap. However, even the bootstrap regions may not attain the desired coverage levels, so we adjust our regions using a double bootstrap. We discuss other methods that merit further exploration.