In this thesis, we present data-driven methods and algorithms to address the emerging challenges for performance assessment and predictive analysis in the national airspace system (NAS). Among these challenges, routing efficiency has received more attention in the research community due to its core position in aviation economics and environmental studies. This motivates us to seek answers to fundamental questions such as "what are the mechanisms and causes of inefficient en route operations?" More importantly, "What can we do to improve these operations? Presumably in a predictive way?"
Chapter 2 provides a macroscopic comparison for flight en route inefficiency among different departing airports, arrival airports, seasons, and flight lengths through a series of fixed-effects models. We show that flights from or to the airports in the east coast corridor and south coast, especially in the New York metropolitan and Florida area, are generally more inefficient than the others. Long-haul flights are more efficient mainly because one component of excess distance -- which results from flights using fixed entry and exit points that are not on the great circle route -- is roughly independent of distance. In addition, flights operated during the summer season, when convective weather is more frequent, are less efficient than those in other seasons.
In Chapter 3, we have first gathered information from the Federal Aviation Administration (FAA) database system, which includes convective weather, wind, miles-in-trail (MIT) restrictions, airspace flow programs (AFP), and special activity airspace (SAA), through data mining techniques. We then propose two mechanisms to ascribe flight en route inefficiency to these factors. Trajectory clustering, attribute matching, statistical modeling, and counterfactual analysis have been employed in our methodological framework. Using this approach, the contributions of wind, connective weather, MIT, AFP and SAA to flight inefficiency have been estimated. Results vary across airport pairs, but in general if we systematically clear the values of these causal factors in the whole airspace, convective weather and wind make the greatest, yet not dramatic, contribution. However, we have also shown that the "marginal contribution estimates" might be caused by the largely homogeneous distribution of different attributes across routes. Therefore, if we "clear the sky" solely for one alternative route, flight en route inefficiency may substantially improve or deteriorate.
Finally, we have developed in Chapter 4 a novel approach to predict, in a real-time manner, the actual 4D (latitude, longitude, altitude, time) aircraft trajectories. Specifically, our framework consists of an efficient tree-based matching algorithm to construct feature maps from high-fidelity meteorological datasets, and an end-to-end convolutional recurrent neural network that includes a long short-term memory (LSTM) encoder network and a mixture density LSTM decoder network. The encoder network embeds last-filed flight plan information into fixed-size hidden state variables and feeds the decoder network, which further learns the spatiotemporal correlations from the historical flight tracks. Convolutional layers are integrated into the pipeline to learn representations from the high-dimension weather features. During the real-time inference process, beam search, adaptive Kalman filter, and Rauch-Tung-Striebel smoother algorithms have been implemented to prune the variance of generated trajectories. Our approach, which is trained on historical flights from the George Bush intercontinental airport (IAH) to Boston Logan international airport (BOS) in 2013, achieves an average of 50 nautical-mile horizontal and 2800 ft. vertical error, which is a new state-of-art method for trajectory prediction tasks.