Finding Similar Days for Air Traffic Management
Airports are central to stimulating the growth and development of economies around the world. With the trend of increasing demand for air travel and transportation of goods, there is mounting pressure on the existing airport infrastructure. This leads to an imbalance in the capacity and demand at airports that costs airlines, passengers and the economy at large billions of dollars. A cost-effective solution to this growing imbalance is to develop ways to utilize the existing infrastructure well. Information from historical days that are similar to a day of operations can be used to gain insights to support traffic management decisions on that day. Recent advances in machine learning and computing power have made it possible to mine and analyze sizable historical archives of different variables that characterize and influence airport operations. Finding similar historical days can help better understand the impact of different traffic management initiatives (TMIs) and identify areas of capacity underutilization. Reduction in airport capacity underutilization can lead to reduction in airport delays.
Decision support tools that can identify similar days and the TMIs taken on these past similar days and their resulting outcomes can augment controller experience to guide decision-making on the reference day at an airport. This information can allow air traffic managers to make less conservative decisions and thus improve airport capacity and reduce delays. This dissertation develops similarity measures between days using airport capacity and demand data. We find that dimensionality reduction is feasible for capacity data, and base capacity similarity on the principal components. Dimensionality reduction cannot be efficiently performed on demand data; consequently demand similarity is based on original data in this case. We find that both capacity and demand data lack natural clusters and thus propose that similarity be viewed as a continuous measure. Finally we estimate measures of overall distance based on both capacity and demand similarity. The estimated distances are visualized using Metric Multidimensional Scaling plots and indicate that most days with significant air traffic management activity are similar to certain other days, validating the potential of this approach for decision support.
Accurate demand and capacity estimates are necessary to generate meaningful similarity measures that can be used in decision-support tools. Predicting airport capacity accurately can also help make better tradeoffs between allowing more flights to operate at the airport and minimizing expensive airborne delay. We develop accurate demand estimates from the Aggregate Demand List (ADL), which contains fine-grained flight schedule data of all the flights operating at an airport. Capacity of an airport can be observed only at sufficiently large demand. However, if the throughput of an airport is limited by the demand, we can only conclude that the capacity is larger than or equal to the observed throughput. The inability to directly observe capacity makes capacity prediction a challenging and less explored problem. This dissertation applies machine-learning methods that incorporate observations censored by insufficient demand to develop an airport capacity prediction model. Specifically, we explore Kaplan Meier estimator, Cox Proportional Hazards model and Random Survival Forest model to predict airport capacity. These models predict a capacity distribution rather than a single capacity value for an hour of interest at an airport using its weather, fleet mix and scheduled demand data. The model results also indicate the influence of different variables on the capacity of the airport. Model performance is compared using several validation measures, including Integrated Brier Score (IBS), Concordance Index (C Index), R2 and RMSE of predicted throughput, that account for the presence of censored observations. The RSF model consistently outperforms the KM estimator and Cox model across all the validation measures.
This dissertation also develops capacity based similarity metrics between two days using the predicted hourly capacity distributions. The evaluation of the estimated similarity metric is challenging owing to the lack of ground truth similarity measures. In this dissertation, we propose a framework to validate the estimated similarity metric between two days using predicted capacity CDFs, demand, TMI and operational outcomes data. The assumption for this framework is that days that are similar based on their capacity, demand and TMI features should be similar based on their operational outcomes. We use a Random Forest model to combine the capacity, demand and TMI based similarity metrics, supervised by operational outcomes similarity metric. This combined similarity matrix is evaluated by measuring its correlation with the operational outcomes similarity matrix on test data. The methodology developed in this dissertation to identify similar days can be extended to any airport around the world using their respective weather, demand, TMI and operational outcome data. This framework uses data that can be forecasted and thus can be used to guide decisions on a day-of-operations in order to guide decision making, as well as a post-operations setting to compare decisions and outcomes on similar historical days.