Head detection using motion features and multi level pyramid architecture

摘要

Monitoring large crowds using video cameras is a challenging task. Detecting humans in video is becoming essential for monitoring crowd behavior. However, occlusion and low resolution in the region of interest hinders accurate crowd segmentation. In such scenarios, it is likely that only the head is visible, and often very small. Most existing people-detection systems rely on low-level visual appearance features such as the Histogram of Oriented Gradients (HOG), and these are unsuitable for detecting human heads at low resolutions. In this paper, a novel head detector is presented using motion histogram features. The shape and the motion information, including crowd direction and magnitude, is learned and used to detect humans in occluded crowds. We introduce novel features based on a multi level pyramid architecture for Motion Boundary Histogram (MBH) and Histogram of Oriented Optical Flow (HOOF), derived from the TV-L1 optical flow. In addition, a new feature, called Relative Motion Distance (RMD) is proposed to efficiently capture correlation statistics. For classification distinguishing human head from similar features, a two-stage Support Vector Machine (SVM) is used, and an explicit kernel mapping on our motion histogram features is performed using Bhattacharyya-distance kernels. A second stage of classification is required to reduce the number of false positives. The proposed features and system were tested on videos from the PETS 2009 dataset and compared with state-of-the-art features, against which our system reported excellent results.