Rule extraction with guarantees from regression models

Highlights：

• Almost all studies about rule extraction investigate classification. In this paper, we study rule extraction from opaque predictive regression models.

• Today, all black-box rule extraction methods suffer from potentially low fidelity on test data. By utilizing conformal prediction in a novel way, the fidelity can be guaranteed, thus solving the main problem with black-box rule extraction.

• Another problem with rule extraction for regression is the choice of representation language; a standard regression tree with point predictions in the leaves is typically both too weak and convey very little information, while more complex alternatives like model trees are not truly comprehensible.

• We suggest a new representation language for the extracted models; i.e., standard regression trees, but augmented with valid and sharp prediction intervals in the leaves.

• In the extensive empirical investigation, the validity of the extracted models is demonstrated.

• In addition, it is shown how normalization can be used to provide individualized prediction intervals, thus providing highly informative extracted models.

摘要

•Almost all studies about rule extraction investigate classification. In this paper, we study rule extraction from opaque predictive regression models.•Today, all black-box rule extraction methods suffer from potentially low fidelity on test data. By utilizing conformal prediction in a novel way, the fidelity can be guaranteed, thus solving the main problem with black-box rule extraction.•Another problem with rule extraction for regression is the choice of representation language; a standard regression tree with point predictions in the leaves is typically both too weak and convey very little information, while more complex alternatives like model trees are not truly comprehensible.•We suggest a new representation language for the extracted models; i.e., standard regression trees, but augmented with valid and sharp prediction intervals in the leaves.•In the extensive empirical investigation, the validity of the extracted models is demonstrated.•In addition, it is shown how normalization can be used to provide individualized prediction intervals, thus providing highly informative extracted models.