Automatic labelling of clusters of discrete and continuous data with supervised machine learning

作者:

Highlights:

• This study presents a definition of the labelling problem and a solution that is based on techniques for supervised learning, unsupervised learning and a discretisation model.

• A method with unsupervised learning is applied to the clustering problem, and a supervised learning algorithm will detect the relevant attributes to define each formed cluster.

• Some strategies are used to form a methodology that presents a label (based on attributes and values) for each provided cluster.

• Discretisation methods 226 will be used to determine the ranges of values of the attributes presented in the 227 labels.

• This methodology is applied to three different databases, in which acceptable results were achieved with an average that exceeds 92.89% of correctly labelled elements.

摘要

•This study presents a definition of the labelling problem and a solution that is based on techniques for supervised learning, unsupervised learning and a discretisation model.•A method with unsupervised learning is applied to the clustering problem, and a supervised learning algorithm will detect the relevant attributes to define each formed cluster.•Some strategies are used to form a methodology that presents a label (based on attributes and values) for each provided cluster.•Discretisation methods 226 will be used to determine the ranges of values of the attributes presented in the 227 labels.•This methodology is applied to three different databases, in which acceptable results were achieved with an average that exceeds 92.89% of correctly labelled elements.

论文关键词:Machine learning,clustering,labelling,artificial neural networks

论文评审过程:Received 20 August 2015, Revised 21 May 2016, Accepted 23 May 2016, Available online 27 May 2016, Version of Record 18 June 2016.

论文官网地址:https://doi.org/10.1016/j.knosys.2016.05.044