The field of transportation and travel behavior research has long been interested in answering causal questions. Take the recent COVID-19 pandemic as an example: the transportation sector in particular was one of the most heavily impacted sectors, and transportation researchers found themselves with a plethora of questions to answer regarding the current and future impacts of the pandemic on the transportation system. Yet, out of roughly 250 pandemic related transportation research papers we reviewed, only about 10 explicitly reference the causal inference literature and are explicit about their causal designs, despite the fact that a significantly larger portion of those papers are trying to answer causal queries. This disconnect is the motivation behind this dissertation.
It is important to acknowledge that transportation researchers have indeed used and contributed to several areas of the causal inference literature. Notable contributions include addressing self-selection bias between residential choice and travel behavior, addressing omitted variable bias through integrated choice and latent variable models (ICLVs), and addressing endogeneity between multiple travel outcomes using joint discrete choice models.
Despite those contributions, many advances in the causal inference literature have yet to enter the field of travel demand modeling. There are many reasons for this disconnect, some of which stem from long-rooted beliefs and practices within the travel behavior and demand modeling literature on model development, selection, and validation. For starters, there is a tendency for discrete choice modelers in transportation to assume their models allow for causal interpretations because they rely on a behavioral theory of human decision making, like variations of random utility theory. While this criterion typically entails endogeneity checks between the outcome and the regressors, it overlooks important nuances about the data generating process and sources of variation in the exogenous variables. This is further exacerbated by the heavy reliance in the field of transportation demand modeling on goodness-of-fit statistics and statistical tests of significance when finalizing modeling specifications, making them prone to the issue of “bad controls”. For instance, adding post-treatment or mediator variables that are exogenous to the outcome but endogenous to the treatment variable of interest, undermines the causal interpretation of the model coefficients, even if adding those variables results in improved predictive model performance and goodness-of-fit statistics. Finally, causal identification strategies rarely appear in the transportation literature. In the causal inference literature, the analyst states the assumptions before drawing any causal conclusions from the model by explicitly specifying the source of variation in the treatment of interest. Such assumptions are referred to as identification strategies: they are assumptions about the data generating process that, only if true, allow the modeler to interpret the model parameters as causal. Those strategies are rarely explicitly stated in transportation demand models, even though those models are often used to evaluate the impact of policy interventions.
To address this gap, this dissertation comprises two parts: 1) a conceptual part, and 2) an applied part.
The conceptual part consists of Chapter 1, where I elaborate in detail on the disconnect described above and point to specific examples from the transportation literature where such misconceptions about causality are most evident. Next, I provide a review of the key concepts in causal inference, and give an overview of the main causal identification strategies used in the empirical sections of this dissertation. I also give an overview of causal graphical models, an alternative causal inference framework to the more well-known potential outcomes (PO) framework which has been gaining popularity in recent years. I focus the overview on parts where I believe this framework, and Directed Acyclic Graphs (DAGs) more specifically, are most useful to transportation researchers.
The applied part consists of Chapters 2, 3 and 4, where I apply some of the causal identification strategies presented in Chapter 1 to answer three different empirical causal research questions in transportation, two of which rely on observational data, while one involves randomized experiments. Each chapter results in domain-specific empirical contributions in its respective area. The common theme across all three chapters is the explicit focus on estimating causal parameters, the clear statement of the causal identification strategies used to estimate those parameters, and the transparency about the source of variation in the treatment. This is in contrast to the common practice in travel behavior modeling where models are specified and estimated without explicitly stating the assumptions under which the estimated parameters can have causal interpretations, and can often lead to erroneous conclusions and misapplications of those models.
In Chapter 2, I quantify the causal effect of telecommuting on travel frequency and distance traveled. This question is motivated by the unprecedented rise of telecommuting in the past two years and its likely persistence in a post-pandemic world. To answer this question, I collected and used five waves of original U.S.-based survey data combined with passive smartphone Point-of-Interest (POI) data collected over the course of the pandemic. I quantify the effects of changes in the frequency of telecommuting on the total number of daily and weekly trips that a telecommuter makes, as well as their total daily and weekly distance traveled. Crucially, I overcome important limitations of related work in the literature by controlling for unobserved individual confounders, a limitation of existing research that predominantly relies on cross-sectional observational data. I do this by leveraging the longitudinal aspect of the data and implementing causal quasi-experimental designs like first-differences and two-way fixed effects regressions with individual, time, and individual by time controls. I find strong evidence that telecommuting causes the generation of new non-commute trips. Specifically, I find that individuals make an average of 1 additional non-commute trip on telecommuting days relative to commute days. This trip is on average shorter than the two-way commute trips, meaning that the net effect of telecommuting on total distance traveled is negative. Importantly, I find that this relationship persists at the weekly level, where I estimate that 1 additional day of telecommuting per week causes an increase of about 1 additional non-commute trip. This means that the additional travel on telecommuting days is the result of a newly generated trip, not a trip that has been shifted from other days of the week. The results are more robust than those in the existing literature and suggest that the trip reduction effects of telecommuting could be overestimated if telecommuting-induced new trip generation is not properly accounted for.
In Chapter 3, I quantify the causal effect of vaccines on reversing pandemic induced mobility trends. The question is motivated by the extensive literature analyzing and forecasting the stickiness of pandemic-induced behavioral and mobility trends in a post-pandemic world. Using the same data as the one used in Chapter 2, I show how the pandemic behavioral response of people in the U.S. was heterogeneous: individuals with low levels of concern about being infected with COVID-19 engaged in riskier behaviors than those with higher levels of concern, including traveling more, attending large gatherings, and using public transportation. Then, using difference-in-differences designs, I show how getting vaccinated affected those behavioral differences. Specifically, I find that getting vaccinated caused an increase in mobility, with vaccinated individuals increasing their number of weekly trips by 4.8 trips per week after getting vaccinated, compared to 1.8 trips for the unvaccinated during the same time period. The difference-in-differences estimate is 3 trips per week, or 170% of the increase in trips for the unvaccinated. The collective results provide important insights on human and travel behavior during the pandemic impact and recovery periods, and how vaccines affect those behaviors.
The third and final empirical question is answered in Chapter 4, and is on learning and optimizing user behavior at plug-in electric vehicle (PEV) charging stations. Research on consumer behavior in the PEV charging context is limited, data is lacking, and consumer preferences, especially at workplace charging stations, remain poorly understood. I address this gap by designing and implementing randomized pricing experiments, the gold standard of causal inference, and quantify key behavioral quantities like the willingness and required incentives to delay charging, and the relationship between plug-in duration and hourly prices. I then propose a novel optimization framework that incorporates those learned behaviors into the optimization objective and significantly improves the operational efficiency of PEV charging stations. My analysis shows that incorporating behavioral theory in the optimization framework results in significantly lower operational costs (up to 17%) and higher net revenues (up to 50%) for the charging station operator compared to the uncontrolled baseline, without sacrificing the user experience.
Aside from the conceptual contributions (Chapter 1) and domain-specific empirical contributions (Chapters 2, 3, and 4), the dissertation also makes data contributions through the collection of two original datasets that will soon be made publicly available to the transportation research community. I have participated in those data collection efforts playing a primary role in designing the surveys, defining the sample, and implementing randomized pricing experiments. The two datasets are:An extensive dataset on a panel of U.S. participants with a comprehensive set of behavioral questions over the course of the pandemic. The data comprises passively collected POI data, as well as five waves of surveys which included questions regarding respondents’ employment status, travel and telecommuting behavior, vaccination status, demographic characteristics, and ideological beliefs. An anonymized and de-identified version of this dataset, along with aggregated mobility metrics from the POI data will be made publicly available for transportation researchers.
An experimental dataset of user behavior at PEV charging stations that includes exogenous variation in the prices and the resulting real life behavioral responses of study participants. This dataset fills a critical gap in the human behavior literature in PEV charging research, where most pricing data are observational and do not allow for price perturbations. The data will be made publicly available for academic researchers along with an accompanying data manuscript that describes it.
Collectively, this dissertation emphasizes the importance of being rigorous about the design and assumptions under which researchers and modelers can draw causal conclusions from models, an important task in transportation research. After all, causal inferences are only valid if their underlying assumptions are correct, yet those assumptions are rarely discussed and formally stated in transportation models. This dissertation fills this gap.