Mining large samples of web-based corpora

作者:

Highlights:

摘要

This paper presents a method to automatically mirror, process, and compare large samples of text corpora from Web-based information systems. The wealth of textual information contained in publicly available Web sites is converted into aggregated representations through textual analysis. The application of word lists, keyword analysis, term clustering, and correspondence analyses to identify and represent semantic relationships, including their longitudinal patterns, is illustrated through a case study that investigates the global coverage of solar power technologies in international media. The resulting graphs, indicators and tables describe complex relationships and developments that are hard to capture in traditional ways. As such they facilitate investigations about the nature and dynamics of Web content.

论文关键词:Web mining,Content analysis,Renewable Energy,Online media

论文评审过程:Received 26 August 2003, Accepted 6 April 2004, Available online 2 June 2004.

论文官网地址:https://doi.org/10.1016/j.knosys.2004.04.003