Skip to content

Cubist  #28

@gtalckmin

Description

@gtalckmin

Hi @topepo,

I am working with raster datasets and employing different rule-based algorithms (namely, CART, Cubist, bagged trees, boosted trees and random forests "RF") for a regression problem (biomass per area).

My initial reasoning was that Cubist would have an optimal prediction performance and require low processing power/time for predictions. The reasons for such should be the low complexity fit between predictors and explained variable.

Result wise, Cubist has performed as well as RF (as per the results of Dunn's Test, using the results of a k-fold repeated cross-validation). M5, on the other hand, is lightning-fast (3 seconds), but not as accurate as RF.

However, and quite surprisingly, Cubist took around one minute, whereas Random Forest needed 19 seconds, to predict the same raster. The same results were reported in this paper: https://doi.org/10.1016/j.neunet.2018.12.010

I would be happy to provide a reprex, if provided a mock-up raster (in which I could perform regression and not classification, although computing time should not be determined by the task). I've seen one of your talks, where you mentioned that Cubist should be faster than Random Forests (provided that is coded in C and is far smaller and optimized, rather than Random Forest).

The size of a Cubist model is around 100kb whereas RF, 5Mb. However, this (in the context where I am working) is not a limiting factor.

Is there something I am doing wrong? I would argue that Cubist should be the work-horse (for tasks such as mine) rather than Random Forest; however, as is, Cubist will be limited by the processing-time

Cheers, Gustavo
PS: I also post this question in StackOverflow, but I reckon it would be useful to have it here, as I am using your package as the basis for these statements.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions