Some Issues Related to the Use of Randomized Trials in the Field of Program Evaluation
This dissertation explores three topics related to the use of randomized trials in the field of program evaluation. The dissertation is organized as three papers, each of which is intended to stand alone. The first paper addresses the problem that evaluators face when they must analyze data from imperfectly implemented randomized trials. The paper focuses on the two-group pretest-posttest design, widely used by evaluators to estimate the causal effects of programs. When this design is used, evaluators must make a choice between analyzing their data with the covariance adjustment approach or gain score approach. A great deal has been written about how this choice should be made, but it has largely been intended for researchers engaging in quasi-experimental or observational studies. This paper proposes a simple, objective way for evaluators to make this choice--the mixed approach--when, as is often the case, they find that an experiment has been weakened by real-world circumstances. When this occurs, evaluators may not be aware of the nature or extent of the flaws in their realized research designs. The mixed approach allows evaluators to choose between the gain score and covariance adjustment approaches more advantageously when (1) ceiling and floor effects are minimal, (2) it is reasonable to estimate an average treatment effect, and (3) the principal uncertainty is whether randomization has ensured the long-term comparability of groups or whether it failed in ways that led to the nonequivalence of groups at the time of the pretest.
The second paper describes a reparameterized Rasch model (RRM). Unlike a traditionally parameterized Rasch model, the RRM produces estimates of item group difficulties and associated tests of significance. Because the RRM is a Rasch model, the item difficulty estimates produced by a traditionally parameterized Rasch model can be calculated using the RRM. The RRM is compared to the traditionally parameterized Rasch model and the multidimensional random coefficients multinomial logit model. An example is provided in which all three models are fit as hierarchical generalized linear models, the models are formally compared, and the RRM is used to answer two research questions. The first question relates to the validity of reverse-scored items. Two groups of items were considered--items with original wording (left) and parallel items with presumed opposite wording (right). The RRM was used to determine whether the left item group and reverse-scored right item group had similar average item difficulties. The second question relates to whether differences between the item groups were associated with respondents' years of professional experience. To answer this question, the RRM was expanded to include a person-level main effect and cross-level (experience by item group) interaction. For evaluators implementing studies with experimental designs, the RRM provides a method of validating a measure while simultaneously answering substantive research questions.
The third paper describes a concrete process that stakeholders can use to make predictions about the future performance of programs in local contexts. Within the field of evaluation, the discussion of validity as it relates to outcome evaluation has been focused on questions of internal validity (Did it work?) to the exclusion of external validity (Will it work?). However, recent debates about the credibility of evaluation evidence have called attention to how evaluations can inform predictions about future performance. Using this as a starting point, I expand upon the traditional framework regarding external validity so closely associated with Donald Campbell. I argue that while there are three broad strategies for strengthening predictions of future program performance--design strategies, reporting strategies, and use strategies--only use strategies have the potential to substantially improve evaluation practice in the foreseeable future. Accordingly, the process I describe is one possible use strategy that is collaborative, systematic, feasible, and transparent.