An efficient and flexible scanning of databases of protein secondary structures

作者:Dariusz Mrozek, Bartek Socha, Stanisław Kozielski, Bożena Małysiak-Mrozek

摘要

Protein secondary structure describe protein construction in terms of regular spatial shapes, including alpha-helices, beta-strands, and loops, which protein amino acid chain can adopt in some of its regions. This information is supportive for protein classification, functional annotation, and 3D structure prediction. The relevance of this information and the scope of its practical applications cause the requirement for its effective storage and processing. Relational databases, widely-used in commercial systems in recent years, are one of the serious alternatives honed by years of experience, enriched with developed technologies, equipped with the declarative SQL query language, and accepted by the large community of programmers. Unfortunately, relational database management systems are not designed for efficient storage and processing of biological data, such as protein secondary structures. In this paper, we present a new search method implemented in the search engine of the PSS-SQL language. The PSS-SQL allows formulation of queries against a relational database in order to find proteins having secondary structures similar to the structural pattern specified by a user. In the paper, we will show how the search process can be accelerated by multiple scanning of the Segment Index and parallel implementation of the alignment procedure using multiple threads working on multiple-core CPUs.

论文关键词:Bioinformatics, Proteins, Secondary structure, Query language, Information retrieval, Parallel programming, Alignment, SQL, Databases

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10844-014-0353-0