Skip to main content
Open Access Publications from the University of California

UC Berkeley

UC Berkeley Electronic Theses and Dissertations bannerUC Berkeley

Activity-Based Urban Mobility Modeling from Cellular Data


Transportation has been one of the defining challenges of our age. Transportation decision makers are facing difficult questions in making informed decisions. Activity-based travel demand models are becoming essential tools used in transportation planning and regional development scenario evaluation. They describe travel itineraries of individual travelers, namely what activities they are participating in, when they perform these activities, and how they choose to travel to the activity locales. However, data collection for activity-based models is performed through travel surveys that are infrequent, expensive, and reflect changes in transportation with significant delays. Thanks to the ubiquitous cell phone data, we see an opportunity to substantially complement these surveys with data extracted from network carrier mobile phone usage logs, such as call detail records (CDRs). The large scale cellular data also opens up the opportunities for researchers to study urban mobility, population estimation, disaster response and social events, etc. However, most of the urban mobility models from cellular data focus on only one aspect of urban mobility (such as location, duration, or travel mode), or model several aspects separately. Moreover, most urban mobility studies ignore the activity types (trip purposes) since the information are not naturally available from the raw cellular traces. These trip purposes carry important information in activity-based travel demand modeling since many travel decisions depend on these activity types, such as travel mode and destination location.

In this dissertation, we explore a framework that develops the state-of-the-art generative activity-based urban mobility models from raw cellular data, with the capability of inferring activity types for complementing activity-based travel demand modeling.

To do so, we first present a method of extracting user stay locations from raw and noisy cellular data while not over-filtering short-term travel. Significant locations such as home and work places are inferred. Along this pre-processing pipeline, we also produce meaningful aggregated statistics about how people construct their daily lives and participate in activities. These statistics used to be available purely from traditional travel surveys, thus were updated very infrequently.

With the processed yet unlabeled activity sequences, we improve the state-of-the-art generative activity-based urban mobility models step by step. First, we designed a method of collecting ground truth activities with the help from short range distributed antenna system (DAS), which has high spatial resolution. As a vanilla model, we first developed Input-Output Hidden Markov Models (IO-HMMs) to infer travelers’ activity patterns. The activity patterns include primary and secondary activities’ spatial and temporal profiles and heterogeneous activity transitions depending on the context. To have a directed learning process, we explored several semi-supervised approaches, including self-training and co-training. The co-training model has both the generative power of IOHMM model and the discriminative nature of decision tree model.

We apply the models to the data collected by a major network carrier serving millions of users in the San Francisco Bay Area. Our activity-based urban mobility model is experimentally validated with three independent data sources: aggregated statistics from travel surveys, a set of collected ground truth activities, and the results of a traffic micro-simulation informed with the travel plans synthesized from the developed generative model. As a classification task, we found that our full IOHMM outperforms partial IOHMM which outperforms standard HMM since IOHMM can incorporate more contextual information. We also found that co-training outperforms self-training, which outperforms the unsupervised IOHMM, thanks to the guidance of ground truth samples. This work is our first effort in exploring an end-to-end actionable solution to the practitioners in the form of modular and interpretable activity-based urban mobility models.

One direct application of the urban mobility model is travel demand forecasting. Predictive models of urban mobility can help alleviate traffic congestion problems in future cities. State-of-the-art in travel demand forecasting is mainly concerned with long (months to years ahead) and very short term (seconds to minutes ahead) models. Long term forecasts aim at urban infrastructure planning, while short term predictions typically use high-resolution freeway detector/camera data to project traffic conditions in the near future. In this dissertation, we present a medium term (hours to days ahead) travel demand forecast system. Our approach is designed to use cellular data that are collected passively, continuously and in real time to predict the intended travel plans of anonymized and aggregated individual travelers. The traffic conditions derived through traffic simulation can overcome the data sparsity for short term prediction. The data resolution, prediction tolerance and accuracy for medium term travel demand forecast are compromises between long term forecast and short term prediction.

We further improved our urban mobility models in two directions. We first separated home and work activity into smaller sub-activities, expecting to get better activity transition probabilities. On the other hand, we made our IOHMM deeper and continuous in hidden state space, with the help of long short term memory units (LSTM). Experimental results show that IOHMMs used in a semi-supervised manner perform well for location prediction while LSTMs are better at predicting temporal day structure patterns thanks to their continuous hidden state space and ability to learn long term dependencies. We validated our predictions by comparing predicted versus observed (1) individual activity sequences; (2) aggregated activity and travel demand; and (3) resulting traffic flows on road networks via a hyper-realistic microsimulation of the predicted travel itineraries. Results show that we can improve the prediction accuracy by incorporating more of the observed data by the time of prediction. We can reach a mean absolute percentage error (MAPE) of less than 5% one hour ahead and 10% three hours ahead.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View