CaDET: interpretable parametric conditional density estimation with decision trees and forests

作者:Cyrus Cousins, Matteo Riondato

摘要

We introduce CaDET, an algorithm for parametric Conditional Density Estimation (CDE) based on decision trees and random forests. CaDET uses the empirical cross entropy impurity criterion for tree growth, which incentivizes splits that improve predictive accuracy more than the regression criteria or estimated mean-integrated-square-error used in previous works. CaDET also admits more efficient training and query procedures than existing tree-based CDE approaches, and stores only a bounded amount of information at each tree leaf, by using sufficient statistics for all computations. Previous tree-based CDE techniques produce complicated uninterpretable distribution objects, whereas CaDET may be instantiated with easily interpretable distribution families, making every part of the model easy to understand. Our experimental evaluation on real datasets shows that CaDET usually learns more accurate, smaller, and more interpretable models, and is less prone to overfitting than existing tree-based CDE approaches.

论文关键词:Parametric models, Random forests, Sufficient statistics

论文评审过程:

论文官网地址:https://doi.org/10.1007/s10994-019-05820-3