Inferring and Replicating Activity Selection and Scheduling Behavior of Individuals
- Author(s): Allahviranloo, Mahdieh
- Advisor(s): Recker, Will
- et al.
Understanding the choices that each individual in the population makes regarding daily plans and activity participation behavior is crucial to forecasting spatial-temporal travel demand in the region. In this dissertation, we develop a comprehensive mathematical/statistical framework to infer and replicate travel behavior of individuals in terms of their socio-demographic profiles. The framework comprises series of distinct modules that employ statistical segmentation, Bayesian econometrics, data mining, and optimization techniques to predict individuals' activity types, activity frequencies, and the travel linkages that make them possible. The key advantages of the model are: first, providing the likely content of activity agenda as part of the inference procedure; second, integrating transportation network topology within activity scheduling step; and third, capability of integrating modal components. The data used for the analysis is the California Household Travel Survey data, 2000-2001, (Caltrans, 2002). After preprocessing (which includes queries to match, clean, and prepare data), the final cleaned data is consisted of activity patterns of 26,269 individuals.
In the model-building process, we initially cluster individuals in the sample based on their reported (one-day) activity patterns. Later, we argue and demonstrate that clustering activity/travel patterns in terms of such activity characteristics as type, duration, scheduling, and location can be an effective tool to capture preferential distributions of arrival time, departure time, and duration, which are unobservable inputs to activity-based travel models. Representative patterns are found based on two measures of dissimilarities between activity patterns, Sequence Alignment Method and Agenda dissimilarity, resulting in 8 clusters. A decision tree based on socio-demographics of individuals is fitted to infer the cluster to which each individual belongs.
Inference on agenda formation in each cluster is based on ensemble of three different modules--"multivariate probit model," "Markov chains with conditional random fields," and "adaptive boosting"-- applied to individuals within each cluster. In each of these modules, the inputs are socio-demographic attributes of individuals, and the outputs are discrete outcomes indicating participation in each activity type.
Arrival time and activity duration inference for each activity type in each cluster, is performed using the adaptive boosting algorithm. Having identified the type of activities, and their arrival time and duration, activities are scheduled in the agenda using two approaches: decision rules, and Household Activity Pattern Problem (HAPP: a variation of pickup and delivery problem with time windows, (Recker, 1995) ).
Testing the entire modeling system on an out-of-sample population--15% of the entire sample-- shows that the model is able to predict on average 80.3% of daily activities of individuals; correct activities during 867 minutes of 1080 awake minutes in a day was predicted.