Data mining for grammatical inference with bioinformatics criteria

作者：

Highlights：

•

摘要

In this work a novel data mining process is described that combines hybrid techniques of association analysis and classical sequentiation algorithms of genomics, to generate grammatical structures of a specific language. Subsequently, these structures are converted to Context-Free Grammars. Initially the method applies to context-free languages with the possibility of being applied to other languages: structured programming, the language of the book of life expressed in the genome and proteome and even the natural languages. We used an application of a compilers generator system that allows the development of a practical application within the area of grammarware, where the concepts of the language analysis are applied to other disciplines, like bioinformatic. The tool allows measuring the complexity of the obtained grammar automatically from textual data.

论文关键词：Grammatical inference,Bioinformatic,Free Context Grammar,DNA,Sequential patterns

论文评审过程：Available online 27 August 2011.

论文官网地址：https://doi.org/10.1016/j.eswa.2011.08.058