-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multithreaded support for apply_forest_proba (Issue #209) #210
base: dev
Are you sure you want to change the base?
Conversation
Codecov Report
@@ Coverage Diff @@
## dev #210 +/- ##
==========================================
+ Coverage 87.99% 88.02% +0.03%
==========================================
Files 10 10
Lines 1249 1253 +4
==========================================
+ Hits 1099 1103 +4
Misses 150 150
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR seems like a step in the right direction to solve #209
apply_tree_proba(tree::Root{S, T}, features::AbstractMatrix{S}, labels; use_multithreading = false) where {S, T} = | ||
apply_tree_proba(tree.node, features, labels, use_multithreading = use_multithreading) | ||
apply_tree_proba(tree::LeafOrNode{S, T}, features::AbstractMatrix{S}, labels; use_multithreading = false) where {S, T} = | ||
stack_function_results(row->apply_tree_proba(tree, row, labels), features, use_multithreading = use_multithreading) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stack_function_results(row->apply_tree_proba(tree, row, labels), features, use_multithreading = use_multithreading) | |
stack_function_results(row->apply_tree_proba(tree, row, labels), features; use_multithreading) |
Lower bound is set to 1.6 so no need to repeat the keyword name
for i in 1:N | ||
out[i, :] = row_fun(X[i, :]) | ||
end | ||
else | ||
for i in 1:N | ||
out[i, :] = row_fun(X[i, :]) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cases are the same?
@@ -16,6 +16,8 @@ cm = confusion_matrix(labels, preds) | |||
@test depth(model) == 1 | |||
probs = apply_tree_proba(model, features, classes) | |||
@test reshape(sum(probs, dims=2), n) ≈ ones(n) | |||
probs_m = apply_tree_proba(model, features, classes, use_multithreading=true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although there isn't a format style guide for this repository, consistent use of spaces around keyword argument equal signs seems like a good start. Here at line 19 are no spaces and at 33 and 59 there are. MLJ style is no spaces around keyword arguments equals I think.
Same holds for using the semicolon to separate the arguments from the keyword arguments. At some places in this PR it is done and and some not. Here, it's generally advised to use semicolons because they improve clarity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for reviewing this @rikhuijzer
@salbert83 Be good to add a docstring, as here: #208
And even better, also add an example in the README.md section on native interface. This should make the new feature more discoverable.
@salbert83 Would you have some time soon to respond to the review? |
@rikhuijzer I'm not getting a response here. Are you willing and able to fishish this? |
It looks like this would result in a nested So, let's leave this open until the lower bound is set to Julia 1.8 or until someone who really needs this implements and shows benchmarks? |
Thanks @rikhuijzer for looking into this.
So, does this also apply to the existing implementation added in https://github.com/JuliaAI/DecisionTree.jl/pull/188/files that therefore needs attention? I also notice that the existing implementation is buy-in ( |
Maybe that explains #188 (comment). I'm afraid, I don't know and also I never need multithreading so I'm not the right person to ask unfortunately. Maybe figuring out the right multithreading for this package is something you would like, @ExpandingMan? |
@OkonSamuel Do you see obvious issues with the way multithreading is currently implemented in prediction? It's here: DecisionTree.jl/src/classification/main.jl Line 468 in f57a156
|
@rikhuijzer I don't believe nested multithreading is an issue. This has been tested before in MLJTuning where optimization multithreading has within it resampling multithreading. My interpretation of the 1.9 changes cited is only that nested threading will typically be more efficient with new default settings for the scheduler. I don't see anything obvious wrong about the proposed implementation (or the existing one), provided user must buy-in, but will wait for the pinged experts to hopefully weigh in. |
I think regression uses the functions in ../classification/main.jl for applying forests to a set of features, so no new development required for this.
Fixes #209