A novel trend surveillance system using the information from web search engines

作者:

Highlights:

• Propose an adaptive trend surveillance framework and an effective feature selection algorithm TF-LTR

• Investigated pair-wise learning to rank models to measure a term's discriminative power

• Support government officials and authorities to construct effective and efficient trend surveillance systems

摘要

Web search engines are becoming a major platform for the general public to access information. It has been suggested that because the search patterns of search engine users are correlated with emerging events, the query log of search engines has the potential for trend surveillance, such as monitoring outbreaks of epidemics. Many trend surveillance studies have investigated the use of query logs and have strived to identify query terms suitable for trend surveillance. Most of these works select representative query terms by consulting domain experts or by preparing a large text corpus for feature selection. The process of these approaches, however, is too costly to make the trend surveillance methods adaptable to different topics. In this paper, we propose an adaptive trend surveillance method. We developed a simple and effective feature selection algorithm, called TF-LTR, which leverages the document returned by search engines and the frequency of the terms in the returned documents to select representative query terms of trending topics. Specifically, we investigated pair-wise learning to rank models in order to measure a term's discriminative power in making a document rank higher in the returned document list. The discriminative power is combined with the term frequency which denotes the on-topic degree of a term to measure a term's representativeness against a trending topic. Representative terms and their query frequencies are applied to a state-of-the-art data mining model to enhance the effectiveness of trend surveillance. The experimental results based on trending topics of different domains show that our trend surveillance method performs well and the ranking information of search engines are helpful for trend surveillance. In light of this, the proposed method can provide effective support for government officials and authorities in order to help them to respond to fast-changing events and topics, and to make appropriate decisions.

论文关键词:Trend surveillance,Learning to rank,Data mining,Feature selection

论文评审过程:Received 2 November 2015, Revised 15 April 2016, Accepted 2 June 2016, Available online 11 June 2016, Version of Record 1 July 2016.

论文官网地址:https://doi.org/10.1016/j.dss.2016.06.001