The consensus string problem and the complexity of comparing hidden Markov models

作者:

Highlights:

摘要

The basic theory of hidden Markov models was developed and applied to problems in speech recognition in the late 1960s, and has since then been applied to numerous problems, e.g. biological sequence analysis. Most applications of hidden Markov models are based on efficient algorithms for computing the probability of generating a given string, or computing the most likely path generating a given string. In this paper we consider the problem of computing the most likely string, or consensus string, generated by a given model, and its implications on the complexity of comparing hidden Markov models. We show that computing the consensus string, and approximating its probability within any constant factor, is NP-hard, and that the same holds for the closely related labeling problem for class hidden Markov models. Furthermore, we establish the NP-hardness of comparing two hidden Markov models under the L∞- and L1-norms. We discuss the applicability of the technique used for proving the hardness of comparing two hidden Markov models under the L1-norm to other measures of distance between probability distributions. In particular, we show that it cannot be used for proving NP-hardness of determining the Kullback–Leibler distance between two hidden Markov models, or of comparing them under the Lk-norm for any fixed even integer k.

论文关键词:

论文评审过程:Received 30 April 2001, Revised 31 October 2001, Available online 17 January 2003.

论文官网地址:https://doi.org/10.1016/S0022-0000(02)00009-0