Minority oversampling for imbalanced time series classification

作者:

Highlights:

摘要

Many vital real-world applications involve time-series data with skewed distribution. Compared to traditional imbalanced learning problems, the classification of imbalanced time-series data is more challenging due to the high dimensionality and high inter-variable correlation. This paper proposes a structure-preserving Oversampling method to resolve the High-dimensional Imbalanced Time-series classification (OHIT). OHIT leverages a density-ratio-based shared nearest neighbor clustering algorithm to capture the modes of minority class in high-dimensional space. It for each mode applies the shrinkage technique of large-dimensional covariance matrix to obtain an accurate and reliable covariance structure. The structure-preserving synthetic samples are eventually generated based on the multivariate Gaussian distribution with the estimated covariance matrix. In addition, to further promote the performance of classifying imbalanced time-series data, we integrate OHIT into boosting framework to obtain a new ensemble algorithm OHITBoost. Extensive experiments on several publicly available time-series datasets (including unimodal and multimodal) demonstrate the effectiveness of OHIT and OHITBoost in terms of F1, G-mean, and AUC.

论文关键词:Imbalanced classification,Oversampling,Ensemble learning,Clustering

论文评审过程:Received 10 November 2021, Revised 4 April 2022, Accepted 5 April 2022, Available online 11 April 2022, Version of Record 26 April 2022.

论文官网地址:https://doi.org/10.1016/j.knosys.2022.108764