The longest common subsequence problem for sequences with nested arc annotations

作者:

Highlights:

摘要

Arc-annotated sequences are useful in representing the structural information of RNA and protein sequences. The Longest Arc-Preserving Common Subsequence (LAPCS) Problem has been introduced in Evans (Algorithms and complexity for annotated sequence analysis, Ph.D. Thesis, University of Victoria, 1999) as a framework for studying the similarity of arc-annotated sequences. Several algorithmic and complexity results on the LAPCS problem have been presented in Evans (1999) and Jiang et al. (in: Proceedings of the 11th Annual Symposium on Combinatorial Pattern Matching (CPM 2000), Lecture Note in Computer Science, Vol. 1848, 2000, pp. 154–165). In this paper, we continue this line of research and present new algorithmic and complexity results on the LAPCS problem restricted to two nested arc-annotated sequences, denoted as LAPCS(nested, nested). The restricted problem is perhaps the most interesting variant of the LAPCS problem and has important applications in the comparison of RNA secondary structures. Particularly, we prove that LAPCS(nested, nested) is NP-hard, which answers an open question in Evans (1999). We then present a polynomial-time approximation scheme for LAPCS(nested, nested) with an additional c-diagonal restriction. An interesting special case, Unary LAPCS(nested, nested), is also investigated, for which we show the NP-hardness and present a better approximation algorithm than the one for general LAPCS(nested, nested).

论文关键词:Sequence annotation,Longest common subsequence,Book embedding,NP-hardness,Approximation algorithm,Polynomial-time approximation scheme

论文评审过程:Received 15 February 2001, Revised 4 September 2001, Available online 17 January 2003.

论文官网地址:https://doi.org/10.1016/S0022-0000(02)00004-1