Scalable modelling and recommendation using wiki-based crowdsourced repositories

作者:

Highlights:

摘要

Wiki-based crowdsourced repositories have increasingly become an important source of information for users in multiple domains. However, as the amount of wiki-based data increases, so does the information overloading for users. Wikis, and in general crowdsourcing platforms, raise trustability questions since they do not generally store user background data, making the recommendation of pages particularly hard to rely on. In this context, this work explores scalable multi-criteria profiling using side information to model the publishers and pages of wiki-based crowdsourced platforms. Based on streams of publisher-page-review triads, we have modelled publishers and pages in terms of quality and popularity using different criteria and user-page-view events collected via a wiki platform. Our modelling approach classifies statistically, both page-review (quality) and page-view (popularity) events, attributing an appropriate rating. The quality-related information is then merged employing Multiple Linear Regression as well as a weighted average. Based on the quality and popularity, the resulting page profiles are then used to address the problem of recommending the most interesting wiki pages per destination to viewers. This paper also explores the parallelisation of profiling and recommendation algorithms using wiki-based crowdsourced distributed data repositories as data streams via incremental updating. The proposed method has been successfully evaluated using Wikivoyage, a tourism crowdsourced wiki-based repository.

论文关键词:Modelling,Scalable data mining,Wiki-based crowdsourcing,Parallel processing,Reputation,User profiling,Cloud computing,Recommender systems

论文评审过程:Received 27 June 2018, Revised 16 November 2018, Accepted 26 November 2018, Available online 29 November 2018, Version of Record 29 December 2018.

论文官网地址:https://doi.org/10.1016/j.elerap.2018.11.004