Machine learning classification of entrepreneurs in British historical census data

作者:

Highlights:

• We improve upon different machine learning algorithms in the field of information science.

• Boosting and ensemble methods improve upon the benchmark logistic regression.

• Algorithms with text-data describing occupations (OccString) achieve accuracies over 99%.

• The best performing model in our census data classification is an RNN deep learning one.

• Machine learning can be of fundamental aid in the classification problem of British censuses.

摘要

•We improve upon different machine learning algorithms in the field of information science.•Boosting and ensemble methods improve upon the benchmark logistic regression.•Algorithms with text-data describing occupations (OccString) achieve accuracies over 99%.•The best performing model in our census data classification is an RNN deep learning one.•Machine learning can be of fundamental aid in the classification problem of British censuses.

论文关键词:Machine learning,Deep learning,Logistic regression,Classification,Big data,Census

论文评审过程:Received 2 August 2019, Revised 16 November 2019, Accepted 20 January 2020, Available online 31 January 2020, Version of Record 31 January 2020.

论文官网地址:https://doi.org/10.1016/j.ipm.2020.102210