Smartphones are becoming an increasingly interesting survey medium for behavioral research due to their value for collecting long-term panel observations and supplementary data on the choice environment. Thanks to the sensor data, it becomes possible to survey participants based on whether or not a certain activity has been carried out. By fusing the phone-generated sensor data and survey responses with data from outside sources, substantial data sets can be generated which can be used to investigate choices in complex environments. Computational systems for behavior research take advantage of automation and scalability opportunities, thereby building also on pertinent bodies of literature regarding machine learning on large data sets and crowdsourcing. The importance of comprehensive, long-term data sets in understanding behavior has been highlighted in the choice theory literature, specifically with respect to capturing an individual decision-maker’s history of choices and personal experiences with those choices. To date, however, relatively few studies have capitalized on emerging technologies to create or analyze such data sets.
Rich data sets which combine panel information on the decision-maker with information on the choice environment can support the study of dynamic phenomena, which is especially important in a rapidly changing world where behavioral adaptation can take place on a relatively small time scale and, once habits are formed, have long-lasting effects. Some examples of pressing questions in the field of transportation involve understanding how travelers are responding to the emerging sharing economy, to new ride sharing services and new information systems, how time use and travel patterns will change due to automated vehicles, and how more sustainable travel behavior can be promoted through incentive or pricing strategies. This dissertation aims to support the adoption of smartphone-based survey technology in travel behavior research in order to lay the groundwork for research aimed at answering the above questions. It describes the design and implementation of a smartphone-based study, presents a system for fusing smartphone data with externally acquired data, and demonstrates how these ample data sets can be leveraged to generate new behavioral insights. The problem chosen for study is the link between transit service quality, rider satisfaction and ridership retention on public transit. This is motivated by the fact that many transit agencies in the United States continue to see large rates of ridership turnover, and that to date, very little is known about what drives transit use cessation.
The six-week San Francisco Travel Quality Study (SFTQS) was conducted in autumn 2013. It collected a data set that included high-resolution phone locations, a number of daily mobile surveys on specific trip experiences, responses to online entry and exit surveys, and transit vehicle locations. By fusing the phone location data with transit vehicle locations, individual-level automatic transit travel diaries could be created without the need to ask participants. The reduced respondent burden, in turn, facilitated a longer term data collection. Initial recruitment proved to be challenging, with response rates to some of the email and direct mailing lists around 1%, and response rates to in-person recruiting between 8 and 15%. On the other hand, attrition was lower than expected, considering the length of the study: The initial enrollment was 856 participants, of which 555 (65%) participants completed all required surveys and 637 (74%) completed the entry and exit survey as well as at least one daily mobile survey. Interestingly, 36% of participants later stated they would have preferred to fill out mobile surveys more frequently (e.g., one per trip rather than one per day) than what was required in the study.
A central part of the computational infrastructure used to collect the data was the system of integrated methods to reconstruct and track travelers’ usage of transit at a detailed level by matching location data from smartphones to automatic transit vehicle location (AVL) data and by identifying all out-of-vehicle and in-vehicle portions of the passengers’ trips. This system is presented in detail in this dissertation, where it is shown how high-resolution travel times and their relationships with the timetable are derived. Approaches are presented for processing relatively sparse smartphone location data in dense transit networks with many overlapping bus routes, distinguishing waits and transfers from non-travel related activities, and tracking underground travel in a metro network. While transit agencies have increasingly adopted systems for collecting data on passengers and vehicles, the ability to derive high-resolution passenger trajectories and directly associate them with vehicles has remained a challenge. The system presented in this dissertation is intended to remedy this situation, and it enables a range of different analyses and applications. Results are presented from an implementation and deployment of the system during the SFTQS. An analysis of out-of-vehicle travel times shows that (a) longer overall travel times in trips involving a transfer are strongly driven by transfer times, and (b) median wait times at the origin stops are consistently low regardless of the headway. The latter can be seen as an effect of real-time information, as it appears that wait times are increasingly spent at locations other than the stop and that passengers time their arrivals at the stop. Given these shifts, the traditional assumption that the average wait time at a transit stop of a high-frequency route is half the headway due to random arrivals may need to be revisited.
This dissertation presents two applications to derive new behavioral insights from the SFTQS data set and to demonstrate the power and value of these new types of data. The analyses were based on participants’ individual history of transit usage and experiences with service quality. The first analysis used the data from the daily mobile surveys to model the link between participants' reported satisfaction with travel times on specific trips (i.e., their subjective assessment) and objective measures of those travel times. Thanks to the tracking data, it was possible to decompose observed travel times into their in-vehicle and out-of-vehicle components, and to compare the observed in-vehicle travel times to scheduled in-vehicle travel times to identify delays suffered while the participant was on board. The estimation results show that on average, a minute of delay on board a vehicle contributed more to passenger dissatisfaction than a minute of waiting time either at the origin stop or at a transfer stop, and that delays on board metro trains are perceived as more onerous than delays on board buses. Furthermore, the models included participants' baseline satisfaction levels as reported in the entry survey and a daily measure of their subjective well-being. Both variables are relatively new elements in travel surveys, and both are seen to be significant in the estimation results. These results indicate that satisfaction with travel times may be composed of a baseline satisfaction level and a variable component that depends on daily experiences, and that there may be non-negligible interactions between subjective well-being and travel satisfaction. Therefore, it is recommended that future survey designs should include measures for both these variables.
The second application builds on the results of the first to empirically investigate the causes for cessation of transit use, with a specific focus on the influence of personal experiences that users have had in the past, on resulting levels of satisfaction, and subsequent behavioral intentions. A latent variable choice model is developed to explain the influence of satisfaction with travel times, including wait times at the origin stop, in-vehicle travel times, transfer times and overall reliability, and satisfaction with the travel environment on behavioral intentions. The group of variables summarized as ``travel environment'' includes crowding, cleanliness, the pleasantness of other passengers, and safety. Satisfaction is modeled as a latent variable, and the choice consists of participants’ stated desire and intention to continue using public transportation in the future. In addition to the delay types captured in the first analysis, a set of negative critical incidents is included, namely being left behind at stops and arriving late to work, school or a leisure activity. The results of the model and descriptive analysis show that operational problems resulting in delays and crowding are much stronger drivers of overall dissatisfaction and cessation than variables related to the travel environment. The importance of baseline satisfaction, mood and the relatively larger impact of in-vehicle delays are confirmed by this model. Thanks to the framework, the critical incidents can be expressed in terms of equivalent delay minutes. For instance, being left behind at a bus stop is found to cause the same amount of dissatisfaction as approximately 18 minutes of wait time. Furthermore, the effect of delays or incidents on ridership can be quantified, as is demonstrated in a set of simulations using the San Francisco transit network (Muni) as a basis. It is shown that if all passengers were subjected to one hypothetical on-board delay of 10 minutes per person, the resulting loss of riders would account for approximately 9.5% of Muni's yearly ridership turnover.
In summary, the contributions and impact of this dissertation are as follows: It presents a framework and system that allows the researcher to gather detailed information on an individual and on the decision environment through phone-based survey apps in combination with sensor data from the phone and from external sources. In the public transit context, an innovative system is presented to match AVL data with smartphone location data in order to measure the personal experiences of travelers with respect to travel times. With repeated measurements, these data can be used to calculate personalized reliability metrics for individual travelers, reflecting the sum of their travel experiences, or they can be used to derive aggregate travel time distributions across all travelers by time of day, origin-destination pair or location on the network. These metrics can capture the true door-to-door travel times experienced by travelers and can serve as the basis for user-centric performance metrics to supplement system-level performance metrics commonly used by transit agencies. The long-term nature of this data collection and low respondent burden facilitate the observation of behavioral dynamics such as habit formation or lifestyle adjustments in response to changes of the choice set. The low cost and scalability of these data collection methods permits relatively short lead times for studies and frequent data collection. Thus, these systems can be deployed at the early stages of a new product's or technology's emergence (e.g., new ride sharing services or traveler information systems) to gain insights into how they affect consumer choices in a real-world setting, both on short-term and longer-term time scales. Moreover, personalized interactions between the researcher and the participant via the smartphone facilitate behavioral interventions and allow for targeted incentives to change behavior.
By elaborating on the researchers’ experiences in designing and implementing the SFTQS, this dissertation provides a powerful example of how these new types of data sets can be harnessed and supports the development of future studies. The application of the methodology and framework to the transit context were motivated by a desire to learn more about the drivers of satisfaction among transit riders and the causes of transit use cessation. The model estimation results underscore the importance of investments into run time stability measures such as transit signal priority systems and dedicated rights of way. Furthermore, thanks to the modeling framework, the cost of holding vehicles with passengers on board and the cost of vehicles not stopping at stops due to overcrowding can be quantified; this can directly impact operating and control policies. On the other hand, the model results provide evidence that investments into stop amenities may be becoming less important as passengers spend less time at stops (except in the case of transfers), but that conversely, the benefits of transit-oriented development may be increasing as passengers can choose to spend their wait times elsewhere thanks to real-time information. The data and modeling framework allow transit agencies to create service quality metrics that can appropriately capture individual users' experiences in real-time, and improve researchers' ability to compare the level of service on public transportation with the level of service of private modes of transportation. Together, this set of methods and results is an important step toward aligning the service provided by transit agencies more closely with the needs of customers.
The framework and methodologies described in this dissertation are useful beyond the specific transit application presented as a case study. Thanks to the flexibility and scalability of the smartphone-based data collection and automated post-processing, they can be used by researchers to quickly gather insights on emerging trends and on travelers' adaptation to new services and new technologies, and to study the dynamics of behavior change over longer time periods. In particular, they can be applied to understand how flexible, shared-ride transportation systems and future automated mobility on demand systems will shape travel demand and how users will interact with those systems. This, in turn, will be a critical input to policy-making and system design.