Probabilistic suffix models for API sequence analysis of Windows XP applications

作者:

Highlights:

摘要

Given the pervasive nature of malicious mobile code (viruses, worms, etc.), developing statistical/structural models of code execution is of considerable importance. We investigate using probabilistic suffix trees (PSTs) and associated suffix automata (PSAs) to build models of benign application behavior with the goal of subsequently being able to detect malicious applications as anything that deviates therefrom. We describe these probabilistic suffix models and present new generic analysis and manipulation algorithms. The models and the algorithms are then used in the context of API (i.e., system call) sequences realized by Windows XP applications. The analysis algorithms, when applied to traces (i.e., sequences of API calls) of benign and malicious applications, aid in choosing an appropriate modeling strategy in terms of distance metrics and consequently provide classification measures in terms of sequence-to-model matching. We give experimental results based on classification of unobserved traces of benign and malicious applications against a suffix model trained solely from traces generated by a small set of benign applications.

论文关键词:Probabilistic suffix model,API sequence classification,Anomaly detection,Agglomerative clustering,Windows XP,Malicious mobile code,Virus,Worm

论文评审过程:Received 20 March 2006, Revised 9 March 2007, Accepted 13 April 2007, Available online 3 May 2007.

论文官网地址:https://doi.org/10.1016/j.patcog.2007.04.006