An evaluation method of words tendency depending on time-series variation and its improvements

作者:

Highlights:

摘要

In every text, some words have frequency appearance and are considered as keywords because they have a strong relationship with the subjects of their texts, these words' frequencies change with time-series variation in a given period. However, in traditional text dealing methods and text search techniques, the importance of frequency change with time-series variation is not considered. Therefore, traditional methods could not correctly determine the index of word's popularity in a given period. In this paper, a new method is proposed to estimate automatically the stability classes (increasing, relatively constant, and decreasing) that indicate word's popularity with time-series variation based on the frequency change in past text data. At first, learning data were produced by defining five attributes to measure the frequency change of a word quantitatively. These five attributes were extracted automatically from electronic texts. These learning data were manual (human) classified into three stability classes. Then, these data were subjected to a decision tree to determine automatically stability classes of analysis data (test data). For learning data, we obtained the attribute values of 443 proper nouns that were extracted from 2216 articles of CNN newspapers (1997–1999) that discussed professional baseball. For testing data, 472 proper nouns that were extracted from 972 articles of CNN newspaper (1997–2000) then classified them automatically using decision tree. According to the comparison between the evaluation of the decision tree results and manually (human) results, F-measures of increasing, relatively constant and decreasing classes were 0.847, 0.851, and 0.768, respectively, and the effectiveness of this method is achieved.

论文关键词:Time-series variation,Words popularity,Decision tree,CNN newspaper

论文评审过程:Received 18 August 2000, Accepted 31 January 2001, Available online 27 November 2001.

论文官网地址:https://doi.org/10.1016/S0306-4573(01)00028-0