Mining free-structured information based on hidden Markov models

作者:

Highlights:

摘要

The potentials of hidden Markov models (HMM) in mining free-structured information are investigated in this study. The samples under test are relating to C4ISR information derived from the contents of ‘Forecast International’, which is a web-based database containing free-structured archive of forecast reports about aerospace systems, weapon systems, and military industries. This study focuses on three C4ISR relating target terms, namely, ‘Company’, ‘System types’, and ‘cost’, for information mining analysis. The experiments are performed in two stages. In the first stage, each HMM being built is exclusively serving for one target term information extraction so as to test the HMM fundamental information extraction capability. While in the second stage, the experiment is then extended to resolve a more complex, multiple term extraction issue. The results reveal that, by using HMMs as a basis, the accuracies can all achieve more than 80% for single target term extraction, and 76% in average for multi-term extraction case.

论文关键词:Hidden Markov model,Information extraction,Forecast international,C4ISR

论文评审过程:Available online 4 January 2006.

论文官网地址:https://doi.org/10.1016/j.eswa.2005.11.022