Apply transfer learning to cybersecurity: Predicting exploitability of vulnerabilities by description

作者:

Highlights:

摘要

Thousands of software vulnerabilities are archived and disclosed to the public each year, posing severe cybersecurity threats to the whole society. Predicting the exploitability of vulnerabilities is crucial for decision-makers to prioritize their efforts and patch the most critical vulnerabilities. Software vulnerability descriptions are accessible features in early stage and contain rich semantic information. Therefore, descriptions are wildly used for exploitability prediction in both industry and academia. However, comparing with other corpora, the size of vulnerability description corpus is too small to train a comprehensive Natural Language Processing (NLP) model. To gain a better performance, this paper proposes a framework named ExBERT to accurately predict if a vulnerability will be exploited or not. ExBERT essentially is an improved Bidirectional Encoder Representations from Transformers (BERT) model for exploitability prediction. First, we fine-tune a pre-trained BERT using collected domain-specific corpus. Then, we design a Pooling Layer and a Classification Layer on top of the fine-tuned BERT model to extract sentence-level semantic features and predict the exploitability of vulnerabilities. Results on 46,176 real-word vulnerabilities have demonstrated that the proposed ExBERT framework achieves 91.12% on accuracy and 91.82% on precision, outperforming the state-of-the-art approach with 89.0% on accuracy and 81.8% on precision.

论文关键词:Transfer learning,Exploitability prediction,ExBERT,Software vulnerability

论文评审过程:Received 25 May 2020, Revised 13 September 2020, Accepted 12 October 2020, Available online 15 October 2020, Version of Record 17 October 2020.

论文官网地址:https://doi.org/10.1016/j.knosys.2020.106529