Warped softmax regression for time series classification

摘要

Linear models are a mainstay in statistical pattern recognition but do not play a role in time series classification, because they fail to account for temporal variations. To overcome this limitation, we combine linear models with dynamic time warping (dtw). We analyze the resulting warped-linear models theoretically and empirically. The three main theoretical results are (i) the Representation Theorem, (ii) the Matrix Complexity Lemma, and (iii) local Lipschitz continuity of the warped softmax function. The Representation Theorem roughly states that warped-linear models correspond to polytope classifiers in Euclidean spaces. This key result is useful because it simplifies analysis of warped-linear models. For example, it provides a geometric interpretation, points to the label dependency problem, and justifies application of warped-linear models not only on temporal but also on multivariate data. The Representation Theorem together with the Matrix Complexity Lemma reveals that warped-linear models implement a weight trick by weight selection and massive weight sharing. Local Lipschitz continuity of warped softmax functions admits a principled training of warped-linear models by stochastic subgradient methods. Empirical results show that replacing the inner product of linear models with a dtw-score substantially improves its predictive performance. The theoretical and empirical contributions of this article provide a simple and efficient first-trial alternative to nearest-neighbor methods and open up new perspectives for more sophisticated classifiers such as warped deep learning.