Weighted automata sequence kernel: Unification and generalization

作者:

Highlights:

摘要

Sequence kernels have been widely used for learning from sequential data. In recent years, a significant effort has been devoted to sequence kernels focusing on individual problems, and so devising several approaches. As a contribution in developing a unified theory of machine learning, in this paper, we complement our previous general framework that deals with sequence kernels, termed weighted automata sequence kernel. We provide a more formal presentation of the framework fully supported with proofs. In fact, the mapping of a string s to a high dimensional feature space can be modeled by a formal power series that can be realized by a weighted automaton (WA) As representing all subsequences of the string s. The computation of the kernel K(s,t) between two strings is the behavior of the intersection weighted automaton As,t=As∩At. Regarding kernel computation efficiency, we propose a forward lookup automaton intersection technique to prevent unsuccessful ε-paths while evaluating the WA computations. The experiments use the Reuters-21578 collection. The results reveal that the kernel evaluation using our proposed technique is faster than the one that uses standard intersection. In order To highlight the unification aspect of our model, we study the relationship between our general framework and a variety of common sequence kernels. Finally, we prove that the proposed formalization can be generalized to sequence multiset kernels showing the robustness of our approach. In our case, the new defined sequence multiset kernel can also be seen as a tree kernel.

论文关键词:Sequence kernel,Weighted automata,Formal power series,Sequence multiset kernel,Unification,Generalization

论文评审过程:Received 22 March 2020, Revised 1 December 2020, Accepted 3 December 2020, Available online 10 December 2020, Version of Record 15 December 2020.

论文官网地址:https://doi.org/10.1016/j.knosys.2020.106654