Embedding HMMs-based models in a Euclidean space: the topological hidden Markov models

摘要

Current extensions of hidden Markov models such as structural, hierarchical, coupled, and others have the power to classify complex and highly organized patterns. However, one of their major limitations is the inability to cope with topology: When applied to a visible observation (VO) sequence, the traditional HMM-based techniques have difficulty predicting the n-dimensional shape formed by the symbols of the VO sequence. To fulfill this need, we propose a novel paradigm named “topological hidden Markov models” (THMMs) that classifies VO sequences by embedding the nodes of an HMM state transition graph in a Euclidean space. This is achieved by modeling the noise embedded in the shape generated by the VO sequence. We cover the first and second level topological HMMs. We describe five basic problems that are assigned to a second level topological hidden Markov model: (1) sequence probability evaluation, (2) statistical decoding, (3) structural decoding, (4) topological decoding, and (5) learning. To show the significance of this research, we have applied the concept of THMMs to: (i) predict the ASCII class assigned to a handwritten numeral, and (ii) map protein primary structures to their 3D folds. The results show that the second level THMMs outperform the SHMMs and the multi-class SVM classifiers significantly.

论文关键词：Structural hidden Markov models,Structural decoding,Topological decoding,Object contour representation,Protein fold recognition,5×2-fold cross validation paired t-test of hypothesis,Chain code representation,Handwritten numeral recognition