Data-Driven Real-Time Risk Predictive Intelligence – A Use Case of Go-Arounds
Skip to main content
eScholarship
Open Access Publications from the University of California

UC Berkeley

UC Berkeley Electronic Theses and Dissertations bannerUC Berkeley

Data-Driven Real-Time Risk Predictive Intelligence – A Use Case of Go-Arounds

Abstract

Although the National Airspace System is one of the safest and most efficient transportation infrastructures, growing air traffic demand and the implementation of autonomous technologies place strain on its safety and efficiency. Recent advances in computing and artificial intelligence (AI) offer an unprecedented capability to incorporate intelligent decision-making into a wide variety of spheres of our life, most notably for big and practical engineering problems. In this dissertation, we develop data-driven real-time risk predictive intelligence to provide decision support for the air transportation system, particularly for the critical air traffic control process during flight approach and landing. Anomalous aircraft behaviors and states are of high interest to the aviation community and hold the keys to ensuring safe, efficient, and environmentally clean flight operations. Through one type of flight anomaly – go-arounds (the aborted landing of an aircraft on final approach) – this dissertation demonstrates the complex interplay between transportation engineering and AI: from theoretical study and algorithmic development, to the computer and software systems, and to the eventual deployment. We investigate the concrete technical and operational challenges of building risk predictive intelligence by integrating a blend of advances from data science, machine learning, software systems, domain-specific sciences and engineering knowledge. While this dissertation focuses on the aviation domain, the established methodological framework has the potential in many other contexts to assess the risk of non-nominal events. We first design a trajectory-based anomaly detection algorithm for identifying go-around events from raw and noisy surveillance data. The current practice of go-around detection mainly relies on voluntary self-reports from controllers or pilots, unrepresentative survey/interview data, or a limited sample of simulation/training data. We therefore propose a rigorous way of detecting go-around occurrence by analyzing historical four-dimensional flight trajectories. This algorithm not only labels the flight in binary responses but also annotates when and where the go-around occurred. We further validate the detection results with another independent data source and find that our detection algorithm identifies more true positive events since it can capture go-arounds initiated farther away, and with more robust criteria. In order to capture the heterogeneous interacting components that may affect the go-around occurrence, feature engineering is carried out to derive a wide variety of operational and environmental variables according to literature search, theoretical studies, interviews with domain experts, and data mining. Among the seven categories of features derived – aircraft characteristics, approach stability, in-trail separation, weather, airport conditions, go-around clustering effects, and runway incursion risk, we propose a new metric termed runway occupancy buffer (ROB) to better reflect air and surface operations interplay during flight approach. We train machine learning models to predict this metric conditioned on other categories of features. The predicted value not only serves as a feature input for modeling go-arounds, but may also directly assist air traffic control in maintaining safe, efficient buffers between successive arrivals. With the labeled events and derived features, we then investigate the traffic and environmental conditions that affect go-around occurrence by quantifying their underlying contributions through principal component logistic regression and counterfactual analysis. While previous studies have investigated various causes of go-around occurrence, none has developed a comprehensive, quantitative assessment of the relative importance of a wide range of factors. Our method overcomes the high dimensionality and multi-collinearity of the original data set while preserving the ability to assess the contribution of the original features to go-around occurrence. We find that factors in the top tier of importance include the approach stability of the subject aircraft, its separation and speed difference from the aircraft in front, and factors related to visibility and cloud ceiling. While the post-event observation-driven insights help decision-making at a strategic level, being able to predict go-around probabilities could provide tactical guidance to foresee and perhaps prevent go-arounds. Existing models on go-around predictions are based on a single snapshot of features in the time series process. We fill the gap by developing machine-learning-based engines for multivariate sequential predictions of go-around probabilities over the entire approach. The sequential models exhibit a consistent and monotonically increasing performance as more information is preserved in the internal state when the flight gets closer to the airport. The LSTM, in general, performs better in predicting go-around occurrence thanks to their continuous hidden state space and ability to learn dependencies. To address the class imbalance issue inherent with the go-around prediction problem, or for any rare event prediction, data augmentation is explored to generate high-fidelity synthetic go-around sequences for improved model training. In particular, we synthesize domain-specific insights with concurrent advances in the Generative Adversarial Networks (GANs) literature to design a GAN architecture for the go-around use case, capable of generating multivariate sequences with variable length and mixed data types. Empirically, we find that this architecture improves the fidelity of the generated go-around sequences, in terms of sequence length, feature distribution, and serial correlation. The performance of the go-around prediction model is compared with different amounts of synthetic go-arounds added to the training set. Experimental results show that models trained with 30% go-around samples perform better. Further efforts on model development and generalization are required for researchers to confidently use such workflows. We additionally present the Go-Around Prediction (GAP) software service, which encapsulates all these pieces of work into a practical application system to provide real-time guidance to air traffic control and ease the future design of risk predictive intelligence. To enable the GAP capabilities, we build the real-time data injection pipeline atop Apache Software Service, ensure pre-trained models can be promptly executed in response to real-time messages, identify suitable test scenarios for the real-time emulation demonstration, and develop a web-based user interface to display the real-time representation of the go-around prediction results. We demonstrate the feasibility and practicality of the GAP service by applying it to a real-world test scenario, with the end-to-end real-time data input and go-around detection output. The GAP software system provides a foundation for designing, developing, and deploying a progression of capabilities that expedites the discovery, prognosis, and mitigation of safety-related threats in transportation systems. Together, various components of this dissertation work are closely interconnected to enable data-driven real-time risk predictive intelligence, while at the same time, each component offers its own contribution. The methodology framework includes the anomaly detection algorithm to identify risky events from unlabeled data, statistical models to uncover and quantify the factor contributions to the event occurrence, generative adversarial networks to augment the minority class, sequential learners to continuously monitor developing risks, and a data streaming pipeline for real-time deployment. It advances the state-of-the-art and is the first effort in realizing a multi-domain situational awareness, predictive, and alerting tool for go-around occurrences, therefore an end-to-end actionable solution to practitioners. In the spirit of near-term practicality, we offer low-cost building blocks that can be used for other real-world applications with data of similar structure, such as risk mitigation in future transportation systems where complexity is expected to be greater with the introduction of autonomous vehicles and urban air mobility into the legacy infrastructure. In view of long-term applicability, the dissertation work holds initial promise to inspire more and further research by theoreticians and practitioners to develop data-driven real-time solutions to predictive intelligence in a broader domain.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View