Increasing traffic congestion, vehicle emissions and commuters delay have been major challenges for urban transportation systems for years. The economic cost of traffic congestion in the US is Increasing from 200 billion in 2013 to 293 billion in 2030. There is an increasing need for a better solution to long-term transportation demand forecasting for urban infrastructure planning, and solution to short-term traffic prediction for managing existing urban infrastructure. Accordingly, understanding how urban systems operate and evolve through modeling individuals' daily urban activities has been a major focus of transportation planners, urban planners, and geographers. Traffic data (loop sensors, surveillance cameras, and GPS in taxis, buses), survey data (ACS, CHTS), mobile phone signals (CDR and GPS) and Location Based Social Network (LBSN) data (Facebook, Twitter, Yelp, and Foursquare) have enabled data-driven research on transportation behavior research. The data-driven research, urban data analytics, is an interdisciplinary field where machine learning/ deep learning methods from computer science and optimization/ simulation methods from operation research are applied in conventional city-related fields using spatial-temporal data. In this dissertation, we aim to add the third dimension, social, to urban data analytics research using social-spatial-temporal data, whose key topic is understanding how friendship influences human behavior over time and space.
In this era of transformative mobility, this can help better design policies and investment strategies for managing existing urban infrastructure and forecasting future urban infrastructure planning. In this dissertation, we explored two research directions on social-enabled urban data analytics. First, we developed new machine learning models for social discrete choice model, bridging the gap between discrete choice modeling research and computer science research. Second, we developed a methodology framework for synthetic population synthesis using both small data and big data.
The first part of the dissertation focus on modeling social influence on human behavior from a graph modeling perspective, while conforming to the discrete choice modeling framework. The proposed models can be used to model how friends influence individual's travel mode choice and other transportation related choices, which is important to transportation demand forecasting. We propose two novel models with scalable training algorithms: local logistics graph regularization (LLGR) and latent class graph regularization (LCGR) models. We add social regularization to represent similarity between friends, and we introduce latent classes to account for possible preference discrepancies between different social groups. Training of the LLGR model is performed using alternating direction method of multipliers (ADMM), and training of the LCGR model is performed using a specialized Monte Carlo expectation maximization (MCEM) algorithm. Scalability to large graphs is achieved by parallelizing computation in both the expectation and the maximization steps. The LCGR model is the first latent class classification model that incorporates social relationships among individuals represented by a given graph. To evaluate our two models, we consider three classes of data: small synthetic data to illustrate the knobs of the method, small real data to illustrate one social science use case, and large real data to illustrate a typical large-scale use case in the internet and social media applications. We experiment on synthetic datasets to empirically explain when the proposed model is better than vanilla classification models that do not exploit graph structure. We illustrate how the graph structure and labels, assigned to each node of the graph, need to satisfy certain reasonable properties. We also experiment on real-world data, including both small scale and large scale real-world datasets, to demonstrate on which types of datasets our model can be expected to outperform state-of-the-art models.
This dissertation also develops an algorithmic procedure to incorporate social information into population synthesizer, which is an essential step to incorporate social information into the transportation simulation framework. Agent-based modeling in transportation problems requires detailed information on each of the agents that represent the population in the region of a study. To extend the agent-based transportation modeling with social influence, a connected synthetic population with both synthetic features and its social networks need to be simulated. However, either the traditional manually-collected household survey data (ACS) or the recent large-scale passively-collected Call Detail Records (CDR) alone lacks features. This work proposes an algorithmic procedure that makes use of both traditional survey data as well as digital records of networking and human behaviors to generate connected synthetic populations. This proposed framework for connected population synthesis is applicable to cities or metropolitan regions where data availability allows for the estimation of the component models. The generated populations coupled with recent advances in graph (social networks) algorithms can be used for testing transportation simulation scenarios with different social factors.