The rapid development of shared mobility and connected and automated vehicles (CAVs) has not only brought new intelligent transportation system (ITS) challenges with the new types of mobility, but also brought a huge opportunity to accelerate the connectivity and informatization of transportation systems, particularly when we consider all the new forms of data that is becoming available. The primary challenge is how to take advantage of the enormous amount of data to discover knowledge, build effective models, and develop impactful applications. With the theoretical and experimental progress being made over the last two decades, data mining and machine learning technologies have become key approaches for parsing data, understanding information, and making informed decisions, especially as the rise of deep learning algorithms bringing new levels of performance to the analysis of large datasets. The combination of data mining and ITS can greatly benefit research and advances in shared mobility and CAVs.
This dissertation focuses on knowledge discovery and data mining for shared mobility and CAV applications. When considering big data associated with shared mobility operations and CAV research, data mining techniques can be customized with transportation knowledge to initially parse the data. Then machine learning methods can be used to model the parsed data to elicit hidden knowledge. Finally, the discovered knowledge and extracted information can help in the development of effective shared mobility and CAV applications to achieve the goals of a safer, faster, and more eco-friendly transportation systems.
In this dissertation, there are four main sections that are addressed. First, new methodologies are introduced for extracting lane-level road features from rough crowdsourced GPS trajectories via data mining, which is subsequently used as the fundamental information for CAV applications. The proposed method results in decimeter level accuracy, which satisfies the positioning needs for many macroscopic and microscopic shared mobility and CAV applications. Second, macroscopic ride-hailing service big data has been analyzed for demand prediction, vehicle operation, and system efficiency monitoring. The proposed deep learning algorithms increase the ride-hailing demand prediction accuracy to 80% and can help the fleet dispatching system reduce 30% of vacant travel distance. Third, microscopic automated vehicle perception data has been analyzed for a real-time computer vision system that can be used for lane change behavior detection. The proposed deep learning design combines the residual neural network image input with time serious control data and reaches 95% of lane change behavior prediction accuracy. Last but not least, new ride sharing and CAV applications have been simulated in a behavior modeling framework to analyze the impact of mobility and energy consumption, which addresses key barriers by quantifying the transportation system-wide mobility, energy and behavior impacts from new mobility technologies using real-world data.