SFEM: Structural feature extraction methodology for the detection of malicious office documents using machine learning methods

作者:

Highlights:

• SFEM is a novel structural feature extraction methodology for XML-Based documents.

• SFEM is static, lightweight, and fast - 37 ms for an average file (250 KB).

• SFEM is leveraged with machine learning for effective malicious document detection.

• Best configuration: Fisher Score, TFIDF, Top 200 features, and Random Forest.

• The best configuration provided: TPR = 0.97, FPR = 0.049, AUC = 0.9912.

摘要

•SFEM is a novel structural feature extraction methodology for XML-Based documents.•SFEM is static, lightweight, and fast - 37 ms for an average file (250 KB).•SFEM is leveraged with machine learning for effective malicious document detection.•Best configuration: Fisher Score, TFIDF, Top 200 features, and Random Forest.•The best configuration provided: TPR = 0.97, FPR = 0.049, AUC = 0.9912.

论文关键词:Machine learning,Malware detection,Static analysis,Structural features,Microsoft office open xml,Document

论文评审过程:Received 25 April 2016, Revised 16 June 2016, Accepted 5 July 2016, Available online 9 July 2016, Version of Record 18 July 2016.

论文官网地址:https://doi.org/10.1016/j.eswa.2016.07.010