A machine learning approach to product review disambiguation based on function, form and behavior classification

Highlights：

• Machine learning classification of product form, function and behavior

• Machine learning classification accuracies of over 82% for base model

• Correlation between product ratings and product form, function and behavior

• Correlation between products' form and products' ratings resulted in a value of 0.934

摘要

Online product reviews have been shown to be a viable source of information for helping customers make informed purchasing decisions. In many cases, users of online shopping platforms have the ability to rate products on a numerical scale, and also provide textual feedback pertaining to a purchased product. Beyond using online product review platforms as customer decision support systems, this information rich data source could also aid designers seeking to increase the chances of their products being successful in the market through a deeper understanding of market needs. However, the increasing size and complexity of products on the market makes manual analysis of such data challenging. Information obtained from such sources, if not mined correctly, risks misrepresenting a product's true success/failure (e.g., a customer leaves a one star rating because of the slow shipping service of a product, not necessarily that he/she dislikes the product). The objective of this paper is three fold: i) to propose a machine learning approach that disambiguates online customer review feedback by classifying them into one of three direct product characteristics (i.e., form, function or behavior) and two indirect product characteristics (i.e., service and other), ii) to discover the machine learning algorithm that yields the highest and most generalizable results in achieving objective i) and iii) to quantify the correlation between product ratings and direct and indirect product characteristics. A case study involving review data for products mined from e-commerce websites is presented to demonstrate the validity of the proposed method. A multilayered (i.e., k-fold and leave one out) validation approach is presented to explore the generalizability of the proposed method. The resulting machine learning model achieved classification accuracies of 82.44% for within product classification, 80.84% for across product classification, 79.03% for across product type classification and 80.64% for across product domain classification. Furthermore, it was determined that the form of a product had the highest Pearson Correlation Coefficient relating to a product's star rating, with a value of 0.934. The scientific contributions of this work have the potential to transform the manner in which both product designers and customers incorporate product reviews into their decision making processes by quantifying the relationship between product reviews and product characteristics.