Structured learning of metric ensembles with application to person re-identification

摘要

Matching individuals across non-overlapping camera networks, known as person re-identification, is a fundamentally challenging problem due to the large visual appearance changes caused by variations of viewpoints, lighting, and occlusion. Approaches in literature can be categorized into two streams: The first stream is to develop reliable features against realistic conditions by combining several visual features in a pre-defined way; the second stream is to learn a metric from training data to ensure strong inter-class differences and intra-class similarities. However, seeking an optimal combination of visual features which is generic yet adaptive to different benchmarks is an unsolved problem, and metric learning models easily get over-fitted due to the scarcity of training data in person re-identification. In this paper, we propose two effective structured learning based approaches which explore the adaptive effects of visual features in recognizing persons in different benchmark data sets. Our framework is built on the basis of multiple low-level visual features with an optimal ensemble of their metrics. We formulate two optimization algorithms, CMC triplet and CMC top, which directly optimize evaluation measures commonly used in person re-identification, also known as the Cumulative Matching Characteristic (CMC) curve. The more standard CMC triplet formulation works on the triplet information by maximizing the relative distance between a matched pair and a mismatched pair in each triplet unit. The CMC top formulation, modeled on a structured learning of maximizing the correct identification among top candidates, is demonstrated to be more beneficial to person re-identification by directly optimizing an objective closer to the actual testing criteria. The combination of these factors leads to a person re-identification system which outperforms most existing algorithms. More importantly, we advance state-of-the-art results by improving the rank-1 recognition rates from 40% to 61% on the iLIDS benchmark, 16% to 22% on the PRID2011 benchmark, 43% to 50% on the VIPeR benchmark, 34% to 55% on the CUHK01 benchmark and 21% to 68% on the CUHK03 benchmark.