A Critique of Structure-from-Motion Algorithms

摘要

I review current approaches to structure from motion (SFM) and suggest a framework for designing new algorithms. The discussion focuses on reconstruction rather than on correspondence and on algorithms reconstructing from many images. I argue that it is important to base experiments and algorithm design on theoretical analyses of algorithm behavior and on an understanding of the intrinsic, algorithm-independent properties of SFM optimal estimation. I propose new theoretical analyses as examples, which suggest a range of experimental questions about current algorithms as well as new types of algorithms. The paper begins with a review of several of the important multi-image-based approaches to SFM, including optimization, fusing (e.g., Kalman filtering), projective methods, and invariant-based algorithms. I suggest that optimization by means of general minimization techniques needs to be supplemented by a theoretical understanding of the SFM least-squares error surface. I argue that fusing approaches are essentially no more robust than algorithms reconstructing from a small number of images and advocate experiments to determine the limitations of fusing. I also propose that fusing may be one of the best reconstruction strategies in situations where few-image algorithms give reasonable results, and suggest that an experimental understanding of the properties of few-image algorithms is important for designing good fusing methods. I emphasize the advantages of an approach based on fusing image-pair reconstructions. With regard to the projective approach, I argue that its trade-off of simplicity versus accuracy/robustness needs more careful experimental examination, and I advocate more research on the effects of calibration error on Euclidean reconstruction. I point out the relative lack of research on adapting Euclidean approaches to deal with incomplete knowledge of the calibration. I argue that invariant-based algorithms could be more nonrobust and inaccurate, and not necessarily much faster, than an approach fusing two-image optimizations. Based on recent results showing that two-image reconstructions are nearly as accurate as multi-image ones, I suggest that the authors of invariants methods conduct careful comparisons of their algorithms to two-image-based results. The remainder of the paper discusses the issues involved in designing a generally applicable SFM algorithm. I argue that current SFM algorithms perform well only in restricted domains, and that different types of algorithms do well on quite different types of sequences. I present examples of three domains that are important in applications and describe three types of algorithms, each of which performs well in just one of the three domains. I advocate testing current algorithms on a wider variety of sequences to determine their limits of applicability. More generally, I propose that SFM is a messy problem and that it could require a flexible “intermediate-level” system incorporating a variety of different algorithms and sophisticated decision rules for combining them. The paper concludes with a general discussion of experiments, pointing out that real-image experiments are not always the best means of evaluating reconstruction algorithms.