-
Notifications
You must be signed in to change notification settings - Fork 741
Description
In the CART model, splitting occurs from the beginning of the entire tree, so CP has a more complex meaning.
Specifically, in the CART model, this is a penalty in the objective function of the form "cp|T|", where T is the number of tree nodes. Technically, CP is still included as a split score, but there is a trick - splitting the tree (what for classification, what for regression is done in general over the entire depth).
Further, in the classification mode, CART uses Gini Index, C4.5 uses Entropy.
I was a student of J. Friedman and he told me about it.
If you want to learn more about CART, I have a couple of notes
https://sites.google.com/site/burlachenkok/articles/decision-trees-parti-decision-trees-for-regression,
https://sites.google .com/site/burlachenkok/articles/some-problems-with-decision-tree
and two talks at MIPT (but they are in Russian)
https://www.youtube.com/watch?v=r4ZTy90233w
https://www.youtube.com/ watch?v=evkzN6AZTnc&t=53s