Automatic labelling of clusters of discrete and continuous data with supervised machine learning

Highlights：

• This study presents a definition of the labelling problem and a solution that is based on techniques for supervised learning, unsupervised learning and a discretisation model.

• A method with unsupervised learning is applied to the clustering problem, and a supervised learning algorithm will detect the relevant attributes to define each formed cluster.

• Some strategies are used to form a methodology that presents a label (based on attributes and values) for each provided cluster.

• Discretisation methods 226 will be used to determine the ranges of values of the attributes presented in the 227 labels.

• This methodology is applied to three different databases, in which acceptable results were achieved with an average that exceeds 92.89% of correctly labelled elements.

摘要

•This study presents a definition of the labelling problem and a solution that is based on techniques for supervised learning, unsupervised learning and a discretisation model.•A method with unsupervised learning is applied to the clustering problem, and a supervised learning algorithm will detect the relevant attributes to define each formed cluster.•Some strategies are used to form a methodology that presents a label (based on attributes and values) for each provided cluster.•Discretisation methods 226 will be used to determine the ranges of values of the attributes presented in the 227 labels.•This methodology is applied to three different databases, in which acceptable results were achieved with an average that exceeds 92.89% of correctly labelled elements.