Decoupled appearance and motion learning for efficient anomaly detection in surveillance video

作者:

Highlights:

摘要

Automating the analysis of surveillance video footage is of great interest when urban environments or industrial sites are monitored by a large number of cameras. As anomalies are often context-specific, it is hard to predefine events of interest and collect labeled training data. A purely unsupervised approach for automated anomaly detection is much more suitable. For every camera, a separate algorithm could then be deployed that learns over time a baseline model of appearance and motion related features of the objects within the camera viewport. Anything that deviates from this baseline is flagged as an anomaly for further analysis downstream. We propose a new neural network architecture that learns the normal behavior in a purely unsupervised fashion. In contrast to previous work, we use latent code predictions as our anomaly metric. We show that this outperforms frame reconstruction-based and prediction-based methods on different benchmark datasets both in terms of accuracy and robustness against changing lighting and weather conditions. By decoupling an appearance and a motion model, our model can also process 16 to 45 times more frames per second than related approaches which makes our model suitable for deploying on the camera itself or on other edge devices.

论文关键词:

论文评审过程:Received 30 March 2020, Revised 10 May 2021, Accepted 15 July 2021, Available online 18 July 2021, Version of Record 30 July 2021.

论文官网地址:https://doi.org/10.1016/j.cviu.2021.103249