A three-string approach to the closest string problem

作者:

Highlights:

摘要

Given a set of n strings of length L and a radius d, the closest string problem (CSP for short) asks for a string tsol that is within a Hamming distance of d to each of the given strings. It is known that the problem is NP-hard and its optimization version admits a polynomial time approximation scheme (PTAS). Parameterized algorithms have been then developed to solve the problem when d is small. In this paper, with a new approach (called the 3-string approach), we first design a parameterized algorithm for binary strings that runs in O(nL+nd3⋅6.731d) time, while the previous best runs in O(nL+nd⋅8d) time. We then extend the algorithm to arbitrary alphabet sizes, obtaining an algorithm that runs in time O(nL+nd⋅(1.612(|Σ|+β2+β−2))d), where |Σ| is the alphabet size and β=α2+1−2α−1+α−2 with α=|Σ|−1+13. This new time bound is better than the previous best for small alphabets, including the very important case where |Σ|=4 (i.e., the case of DNA strings).

论文关键词:Computational biology,The closest string problem,Fixed-parameter algorithms

论文评审过程:Received 18 May 2010, Revised 11 January 2011, Accepted 21 January 2011, Available online 27 January 2011.

论文官网地址:https://doi.org/10.1016/j.jcss.2011.01.003