By the numbers: The magic of numerical intelligence in text analytic systems

作者:

Highlights:

• Numerical tokens are often underutilized in text analytic systems.

• We propose a procedure for finding and classifying numerical tokens, using postings in an online forum as a case study.

• We demonstrate that the numbers can be reliably classified and discretized using supervised machine learning.

• We further show that the number features can enhance product defect discovery.

• A Post Market Quality Surveillance decision support system that leverages numerical tokens is designed and recommended.

摘要

There is a growing recognition among MIS researchers and practitioners that social media provide a valuable source of business intelligence. Unearthing relevant and useful information among the voluminous postings remains a challenge, however. Automated methods based on text mining have made significant progress in recent years by discovering a variety of new methods and features. This study adds to this stream by introducing a novel text mining procedure centered around numerical expressions contained in text documents. In this method, numerical expressions are extracted, categorized, and binned, and their presence and magnitude are stored as document features. We demonstrate, using a case study from the automotive industry, that numerical expressions can be reliably identified, and that these numerical features enable improvements in document classification. As an extension to this case study, we contribute a decision support system for managing product quality using both textual and numerical attributes.

论文关键词:Text analytics,Information retrieval,Numerical attributes,Defect discovery

论文评审过程:Received 2 March 2018, Revised 30 June 2018, Accepted 31 July 2018, Available online 3 August 2018, Version of Record 11 August 2018.

论文官网地址:https://doi.org/10.1016/j.dss.2018.07.004