Skip to main content
eScholarship
Open Access Publications from the University of California

UCLA

UCLA Electronic Theses and Dissertations bannerUCLA

Balance Tests as a Learning Problem: Assessing 3,000 Lotteries with Machine Learning

Abstract

This thesis proposes a way to look beyond mean balance tests. Balance tests are a key component for plausible causal identification. Both in experimental designs and natural experiments, the standard practice is to use statistical tests with the null hypothesis of no difference between the pre-treatment covariates across treatment and control groups. The main idea is that the distributions of pretreatment variables should be roughly balanced between treatment and control groups. In recent decades, presenting evidence for the quality of the causal research designs became standard in social science. While observational researchers normally focus on evidence of balance for the covariates included in their model, experimental researchers provide randomization tests for balance on pretreatment covariates.

The thesis asks if any functions of the observed covariates, linear or otherwise, is able to predict who wins lotteries used to distribute houses to low-income citizens in Brazil. My final dataset contains 1,777,385 observations (winners and losers of the lotteries), distributed among 3,012 lotteries. By using a random forest algorithm that attempts to predict who will win the lottery, I extract from each lottery an out-of-sample estimate of model fit, including the AUC, the prediction R2 (the correlation of the estimated probability of winning with the actual indicator for winning, squared), and corresponding p-value. The distribution of these estimates across lotteries are shown.

A large-scale housing program in Brazil is used to assess the use of machine learning as a balance test tool. The Minha Casa Minha Vida program (MCMV) awards heavily subsidized mortgages to low-income citizens through a lottery system at the municipal level. The thesis focuses on a bracket of the program in which housing assignments are made by lottery. Between 2009 and 2017, the MCMV Faixa 1 program resulted in contracts for 1.4 million housing units in almost 3000 municipalities. With over R$74 billion (USD 18.5 billion) spent in Faixa 1 benefits, it is one of the largest lottery-based housing projects in the world, according to the Brazilian government, and the largest housing program ever implemented in Latin America.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View