Web attack detection based on traps | 数据学习(DataLearner)

摘要

Every website on the Internet is somewhat vulnerable to security attacks. These attacks are constantly changing, and it is challenging to detect the latest, not known attacks. Our goal is automation of attack detection by incremental learning of the latest types of attacks. We have placed web traps around the Internet in a way that regular users cannot find and interact with them, while they are visible to standard hacker tools and methods. Consequently, we obtain continuous information about new types of attacks, contrary to most datasets from the literature created in artificial settings. In this paper, for the purpose of effective web attack detection without many false positives, we propose an efficient way to create a dataset by combining malicious requests from the traps and benign requests from a regular website. Since our goal is automation, we tested a significant number of shallow and deep machine learning models to separate regular from malicious HTTP requests, using only simple features, such as n-grams of characters. Additionally to our dataset, we have evaluated all the models on the large publicly available FWAF dataset. We also conducted model testing on zero-day attacks, in which training and validation requests were collected in separate time intervals. One of the biggest problems in machine learning is catastrophic forgetting. When training on new data, the model forgets the knowledge learned from previous examples. To mitigate that problem, we have implemented three incremental learning approaches for web attack detection and obtained good results during testing.