Fast data reduction by space partitioning via convex hull and MBR computation

作者:

Highlights:

• Two variations of the RSP3 prototype generation algorithm for data reduction are proposed. They are very fast and can be used on large datasets.

• Our proposed variations use mechanisms that exploit the notions of Convex Hull or Minimum Bounding Rectangle to approximate the diameter of a dataset.

• RSP3-QH3d utilizes the Quick Hull algorithm for 3d convex hull computation. The farthest instances of the 3d convex hull are used as an approximation of the diameter of a dataset.

• The second variation (RSP3-MBR) uses the farthest instances of a dataset’s MBR to approximate its diameter.

• The experimental results reveal that both proposed algorithms outperform RSP3 as both achieve similar accuracy and reduction rate as RSP3 at a fraction of the computational CPU cost of the latter.

摘要

•Two variations of the RSP3 prototype generation algorithm for data reduction are proposed. They are very fast and can be used on large datasets.•Our proposed variations use mechanisms that exploit the notions of Convex Hull or Minimum Bounding Rectangle to approximate the diameter of a dataset.•RSP3-QH3d utilizes the Quick Hull algorithm for 3d convex hull computation. The farthest instances of the 3d convex hull are used as an approximation of the diameter of a dataset.•The second variation (RSP3-MBR) uses the farthest instances of a dataset’s MBR to approximate its diameter.•The experimental results reveal that both proposed algorithms outperform RSP3 as both achieve similar accuracy and reduction rate as RSP3 at a fraction of the computational CPU cost of the latter.

论文关键词:Reduction by space partitioning,RSP3,Classification,Prototype generation,Big training data,Convex hull,Minimum bounding rectangle (MBR)

论文评审过程:Received 24 March 2021, Revised 16 January 2022, Accepted 22 January 2022, Available online 25 January 2022, Version of Record 31 January 2022.

论文官网地址:https://doi.org/10.1016/j.patcog.2022.108553