Skip to content

Page-1: CP is not the cost of split in fact #7

@burlachenkok

Description

@burlachenkok

In the CART model, splitting occurs from the beginning of the entire tree, so CP has a more complex meaning.

Specifically, in the CART model, this is a penalty in the objective function of the form "cp|T|", where T is the number of tree nodes. Technically, CP is still included as a split score, but there is a trick - splitting the tree (what for classification, what for regression is done in general over the entire depth).

Further, in the classification mode, CART uses Gini Index, C4.5 uses Entropy.

I was a student of J. Friedman and he told me about it.
If you want to learn more about CART, I have a couple of notes
https://sites.google.com/site/burlachenkok/articles/decision-trees-parti-decision-trees-for-regression,
https://sites.google .com/site/burlachenkok/articles/some-problems-with-decision-tree

and two talks at MIPT (but they are in Russian)
https://www.youtube.com/watch?v=r4ZTy90233w
https://www.youtube.com/ watch?v=evkzN6AZTnc&t=53s

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions