Probabilistic techniques for phrase extraction

作者:

Highlights:

摘要

This study proposes a probabilistic model for automatically extracting English noun phrases without part-of-speech tagging or any syntactic analysis. The technique is based on a Markov model, whose initial parameters are estimated by a phrase lookup program with a phrase dictionary, then optimized by a set of maximum entropy (ME) parameters for a set of morphological features. Using the Viterbi algorithm with the trained Markov model, the program can dynamically extract noun phrases from input text. Experiments show that this technique is of comparable effectiveness with the best existing techniques.

论文关键词:Indexing,Phrase extraction,Markov,Maximum entropy,Phrase dictionary,Tokenize,Smooth,Bias

论文评审过程:Received 27 January 2000, Accepted 5 June 2000, Available online 5 February 2001.

论文官网地址:https://doi.org/10.1016/S0306-4573(00)00029-7