SFEM: Structural feature extraction methodology for the detection of malicious office documents using machine learning methods
作者:
Highlights:
• SFEM is a novel structural feature extraction methodology for XML-Based documents.
• SFEM is static, lightweight, and fast - 37 ms for an average file (250 KB).
• SFEM is leveraged with machine learning for effective malicious document detection.
• Best configuration: Fisher Score, TFIDF, Top 200 features, and Random Forest.
• The best configuration provided: TPR = 0.97, FPR = 0.049, AUC = 0.9912.
摘要
•SFEM is a novel structural feature extraction methodology for XML-Based documents.•SFEM is static, lightweight, and fast - 37 ms for an average file (250 KB).•SFEM is leveraged with machine learning for effective malicious document detection.•Best configuration: Fisher Score, TFIDF, Top 200 features, and Random Forest.•The best configuration provided: TPR = 0.97, FPR = 0.049, AUC = 0.9912.
论文关键词:Machine learning,Malware detection,Static analysis,Structural features,Microsoft office open xml,Document
论文评审过程:Received 25 April 2016, Revised 16 June 2016, Accepted 5 July 2016, Available online 9 July 2016, Version of Record 18 July 2016.
论文官网地址:https://doi.org/10.1016/j.eswa.2016.07.010