## Three Essays in Counterfactual Econometrics

- Author(s): Pereda Fernandez, Santiago
- Advisor(s): Graham, Bryan S
- et al.

## Abstract

In the first chapter of this dissertation I present a new method to identify and estimate the strength of social spillovers in the classroom and the distribution of teacher and student effects. The identification depends on the assumptions of double randomization of teacher and students to classrooms and the linear in means equation of test scores. The linear independent factor representation of test scores allows the estimation of the parameters of interest by combining all the joint moments of different orders. I also present a theoretical model of social interactions in the classroom that yields the linear in means equation for test scores. In this model, the teacher and students play a game in which they choose how much effort to exert. The method I provide allows the estimation of moments of Rth order, recovering more features of the distribution of teacher and student effects than the mean and variance. Class size heteroskedastic teacher and student effects can be easily accomodated. For the estimation, I use a minimum distance procedure that combines the information coming from different moments. Using the Tennessee Project STAR dataset, I find sizeable spillovers in the classroom. Moreover, the distributions of teacher and student abilities seem to depart from the usual normality assumption, and the student distribution exhibits a high degree of heteroskedasticity in class size. Based on these estimates, I perform several counterfactual social planning experiments, comparing who are the losers and winners under different assignment rules. Assignment of good teachers to large classrooms increases the average test scores, with students in the left tail of the distribution benefiting more than the rest. Assignment of good students to small classrooms increases the test scores of students in the right tail of the distribution, while decreasing test scores of students in the left tail of the distribution, with an overall increase in mean test scores. Mixing good and bad students together results in a small effect on mean test scores, but reduces inequality.

In the second chapter I propose an estimator of the conditional distribution of an outcome variable in the presence of heterogeneous effects and a continuous endogenous treatment. The model is triangular, with both the first and the second stage equations being a linear-in-covariates quantile process. The endogeneity of the model is captured by the quantile copula of both equations, and it is identified by inverting the quantile processes conditional on a vector of covariates. Using quantile regression techniques, I estimate both conditional quantile processes, and the copula distribution can then be estimated either nonparametrically or parametrically. Integration of the copula for a given vector of the instruments, estimates the conditional distribution of the outcome variable. This allows to then estimate the distribution of the covariates on the unconditional distribution of the outcome variable, or any other function such as the unconditional quantile function or the Gini Index. Similarly, to estimate the effect of a policy on the unconditional distribution of the outcome variable, one simply needs to integrate the conditional distribution over the marginal of the covariates under the counterfactual policy. Uniform asymptotic distribution for these estimators is provided, allowing to make inference on them and constructing the usual confidence sets. I use data on twins to estimate the the unconditional quantile treatment effect of increasing education by one year to all individuals in the dataset. The results show an increase in the distribution of wages that ranges between 8% and 20%, with those at the upper quantiles of the distribution benefiting the most.

In the third chapter I propose an estimator of the unconditional distribution of an outcome variable, when this variable depends on a binary treatment that is endogenous to the unobservables, and the effect of the treatment and other exogenous variables on the outcome variable is heterogeneous. The estimator is based on a triangular model consisting on the probability of being treated and a quantile process that determines the outcome variable. Using a parametric assumption about the copula distribution and the exclusion restriction I identify the copula distribution. The estimation is a multi-step procedure that involves the estimation of the quantile process of the second stage equation, the probability of being treated by maximum likelihood, and the copula distribution. These estimators are then used to estimate the distribution of the outcome variable conditional on a set of instruments. Finally, I show the finite sample performance of the estimator with a Monte Carlo experiment.