Early segmentation of students according to their academic performance: A predictive modelling approach

作者:

Highlights:

• We propose a two-stage model for early predicting students overall academic success.

• We introduce a new academic performance metric and use single and ensemble data mining techniques.

• We introduce a students' segmentation approach based on evidences and predictions.

• A dataset of about 2500 students is used to validate the proposed methodology.

• The proposed model reveals to have high predictive power (above 95%).

摘要

The early classification of university students according to their potential academic performance can be a useful strategy to mitigate failure, to promote the achievement of better results and to better manage resources in higher education institutions. This paper proposes a two-stage model, supported by data mining techniques, that uses the information available at the end of the first year of students' academic career (path) to predict their overall academic performance. Unlike most literature on educational data mining, academic success is inferred from both the average grade achieved and the time taken to conclude the degree. Furthermore, this study proposes to segment students based on the dichotomy between the evidence of failure or high performance at the beginning of the degree program, and the students' performance levels predicted by the model. A data set of 2459 students, spanning the years from 2003 to 2015, from a European Engineering School of a public research University, is used to validate the proposed methodology. The empirical results demonstrate the ability of the proposed model to predict the students' performance level with an accuracy above 95%, in an early stage of the students' academic path. It is found that random forests are superior to the other classification techniques that were considered (decision trees, support vector machines, naive Bayes, bagged trees and boosted trees). Together with the prediction model, the suggested segmentation framework represents a useful tool to delineate the optimum strategies to apply, in order to promote higher performance levels and mitigate academic failure, overall increasing the quality of the academic experience provided by a higher education institution.

论文关键词:Educational data mining,Predictive modelling,Data mining,Academic performance,Engineering education

论文评审过程:Received 29 July 2017, Revised 1 August 2018, Accepted 3 September 2018, Available online 18 September 2018, Version of Record 22 September 2018.

论文官网地址:https://doi.org/10.1016/j.dss.2018.09.001