LsHASHq: A string matching algorithm exploiting longer q-gram shifting

作者:

Highlights:

• A fast q-gram hash function based exact string matching algorithm for any alphabet.

• Our avg time complexity is O(n/(m−d)), where n,m are length of text and pattern.

• Our algorithm shift skips £m positions vs other algorithms skip £m−q+1.

• Achieved second best avg runtime on Human Chromosome 1 for patterns of size 10.

• Best avg runtime (vs 23 algorithms) for pattern size <= 60 on natural language text.

摘要

•A fast q-gram hash function based exact string matching algorithm for any alphabet.•Our avg time complexity is O(n/(m−d)), where n,m are length of text and pattern.•Our algorithm shift skips £m positions vs other algorithms skip £m−q+1.•Achieved second best avg runtime on Human Chromosome 1 for patterns of size 10.•Best avg runtime (vs 23 algorithms) for pattern size <= 60 on natural language text.

论文关键词:String matching algorithms,Pattern matching,q-gram hashing,Online search,Sequence analysis

论文评审过程:Received 1 March 2022, Revised 25 July 2022, Accepted 2 August 2022, Available online 24 August 2022, Version of Record 24 August 2022.

论文官网地址:https://doi.org/10.1016/j.ipm.2022.103057