Ensemble machine learning models for aviation incident risk prediction

Highlights：

• A hybrid model blending SVM and DNN ensemble predictions is developed to quantify the risk level of abnormal aviation events.

• An innovative probabilistic fusion rule is proposed to blend the two model predictions.

• Cross-validation and statistical tests are used to demonstrate the prediction performance of the developed hybrid model.

摘要

With the spectacular growth of air traffic demand expected over the next two decades, the safety of the air transportation system is of increasing concern. In this paper, we facilitate the “proactive safety” paradigm to increase system safety with a focus on predicting the severity of abnormal aviation events in terms of their risk levels. To accomplish this goal, a predictive model needs to be developed to examine a wide variety of possible cases and quantify the risk associated with the possible outcome. By utilizing the incident reports available in the Aviation Safety Reporting System (ASRS), we build a hybrid model consisting of support vector machine and an ensemble of deep neural networks to quantify the risk associated with the consequence of each hazardous cause. The proposed methodology is developed in four steps. First, we categorize all the events, based on the level of risk associated with the event consequence, into five groups: high risk, moderately high risk, medium risk, moderately medium risk, and low risk. Secondly, a support vector machine model is used to discover the relationships between the event synopsis in text format and event consequence. In parallel, an ensemble of deep neural networks is trained to model the intricate associations between event contextual features and event outcomes. Thirdly, an innovative fusion rule is developed to blend the prediction results from the two types of trained machine learning models, thereby improving the prediction. Finally, the prediction on risk level categorization is extended to event-level outcomes through a probabilistic decision tree. By comparing the performance of the developed hybrid model against another three individual models with ten-fold cross-validation and statistical tests, we demonstrate the effectiveness of hybrid model in quantifying the risk related to the consequences of hazardous events.