Multithreaded support for apply_forest_proba (Issue #209) #210

salbert83 · 2023-01-15T14:12:25Z

I think regression uses the functions in ../classification/main.jl for applying forests to a set of features, so no new development required for this.

Fixes #209

codecov-commenter · 2023-01-15T14:16:29Z

Codecov Report

Merging #210 (686a44b) into dev (835f3cd) will increase coverage by 0.03%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##              dev     #210      +/-   ##
==========================================
+ Coverage   87.99%   88.02%   +0.03%     
==========================================
  Files          10       10              
  Lines        1249     1253       +4     
==========================================
+ Hits         1099     1103       +4     
  Misses        150      150

Impacted Files	Coverage Δ
src/classification/main.jl	`96.16% <100.00%> (+0.05%)`	⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

rikhuijzer

This PR seems like a step in the right direction to solve #209

rikhuijzer · 2023-01-15T16:49:28Z

src/classification/main.jl

+apply_tree_proba(tree::Root{S, T}, features::AbstractMatrix{S}, labels; use_multithreading = false) where {S, T} =
+    apply_tree_proba(tree.node, features, labels, use_multithreading = use_multithreading)
+apply_tree_proba(tree::LeafOrNode{S, T}, features::AbstractMatrix{S}, labels; use_multithreading = false) where {S, T} =
+    stack_function_results(row->apply_tree_proba(tree, row, labels), features, use_multithreading = use_multithreading)


Suggested change

stack_function_results(row->apply_tree_proba(tree, row, labels), features, use_multithreading = use_multithreading)

stack_function_results(row->apply_tree_proba(tree, row, labels), features; use_multithreading)

Lower bound is set to 1.6 so no need to repeat the keyword name

rikhuijzer · 2023-01-15T16:49:38Z

src/classification/main.jl

+        for i in 1:N
+            out[i, :] = row_fun(X[i, :])
+        end
+    else
+        for i in 1:N
+            out[i, :] = row_fun(X[i, :])
+        end


Cases are the same?

rikhuijzer · 2023-01-15T16:51:33Z

test/classification/iris.jl

@@ -16,6 +16,8 @@ cm = confusion_matrix(labels, preds)
 @test depth(model) == 1
 probs = apply_tree_proba(model, features, classes)
 @test reshape(sum(probs, dims=2), n) ≈ ones(n)
+probs_m = apply_tree_proba(model, features, classes, use_multithreading=true)


Although there isn't a format style guide for this repository, consistent use of spaces around keyword argument equal signs seems like a good start. Here at line 19 are no spaces and at 33 and 59 there are. MLJ style is no spaces around keyword arguments equals I think.

Same holds for using the semicolon to separate the arguments from the keyword arguments. At some places in this PR it is done and and some not. Here, it's generally advised to use semicolons because they improve clarity.

ablaom

Thanks for reviewing this @rikhuijzer

@salbert83 Be good to add a docstring, as here: #208

And even better, also add an example in the README.md section on native interface. This should make the new feature more discoverable.

ablaom · 2023-01-25T23:18:36Z

@salbert83 Would you have some time soon to respond to the review?

ablaom · 2023-02-06T21:45:33Z

@rikhuijzer I'm not getting a response here. Are you willing and able to fishish this?

rikhuijzer · 2023-02-07T07:41:38Z

It looks like this would result in a nested @threads call. One time in stack_function_results and one time inside the row_fun that is passed into stack_function_results. Nested @threads should be possible with the :dynamic scheduler, which appears to have only been added in Julia 1.8 (https://github.com/JuliaLang/julia/blob/master/HISTORY.md). Also, I don't know how to benchmark whether adding multithreading actually saves time or not.

So, let's leave this open until the lower bound is set to Julia 1.8 or until someone who really needs this implements and shows benchmarks?

ablaom · 2023-02-07T22:35:08Z

Thanks @rikhuijzer for looking into this.

It looks like this would result in a nested @threads call.

So, does this also apply to the existing implementation added in https://github.com/JuliaAI/DecisionTree.jl/pull/188/files that therefore needs attention?

I also notice that the existing implementation is buy-in (use_multithreading=false is the default) whereas the present addition is buy-out.

rikhuijzer · 2023-02-08T09:27:53Z

So, does this also apply to the existing implementation added in https://github.com/JuliaAI/DecisionTree.jl/pull/188/files that therefore needs attention?

Maybe that explains #188 (comment). I'm afraid, I don't know and also I never need multithreading so I'm not the right person to ask unfortunately.

Maybe figuring out the right multithreading for this package is something you would like, @ExpandingMan?

ablaom · 2023-02-08T20:32:57Z

@OkonSamuel Do you see obvious issues with the way multithreading is currently implemented in prediction? It's here:

DecisionTree.jl/src/classification/main.jl

Line 468 in f57a156

Threads.@threads for i in 1:N

ablaom · 2023-02-10T01:13:14Z

@rikhuijzer I don't believe nested multithreading is an issue. This has been tested before in MLJTuning where optimization multithreading has within it resampling multithreading. My interpretation of the 1.9 changes cited is only that nested threading will typically be more efficient with new default settings for the scheduler.

I don't see anything obvious wrong about the proposed implementation (or the existing one), provided user must buy-in, but will wait for the pinged experts to hopefully weigh in.

salbert83 added 2 commits January 15, 2023 09:07

Multithreaded support for apply_forest_proba

876dd49

Unit tests for multithreaded support for apply_forest_proba

686a44b

rikhuijzer reviewed Jan 15, 2023

View reviewed changes

ablaom reviewed Jan 16, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multithreaded support for apply_forest_proba (Issue #209) #210

Multithreaded support for apply_forest_proba (Issue #209) #210

salbert83 commented Jan 15, 2023 •

edited by rikhuijzer

Loading

codecov-commenter commented Jan 15, 2023 •

edited

Loading

rikhuijzer left a comment

rikhuijzer Jan 15, 2023

rikhuijzer Jan 15, 2023

rikhuijzer Jan 15, 2023 •

edited

Loading

ablaom left a comment

ablaom commented Jan 25, 2023

ablaom commented Feb 6, 2023

rikhuijzer commented Feb 7, 2023

ablaom commented Feb 7, 2023

rikhuijzer commented Feb 8, 2023

ablaom commented Feb 8, 2023

ablaom commented Feb 10, 2023 •

edited

Loading

	stack_function_results(row->apply_tree_proba(tree, row, labels), features, use_multithreading = use_multithreading)
	stack_function_results(row->apply_tree_proba(tree, row, labels), features; use_multithreading)

Multithreaded support for apply_forest_proba (Issue #209) #210

Are you sure you want to change the base?

Multithreaded support for apply_forest_proba (Issue #209) #210

Conversation

salbert83 commented Jan 15, 2023 • edited by rikhuijzer Loading

codecov-commenter commented Jan 15, 2023 • edited Loading

Codecov Report

rikhuijzer left a comment

Choose a reason for hiding this comment

rikhuijzer Jan 15, 2023

Choose a reason for hiding this comment

rikhuijzer Jan 15, 2023

Choose a reason for hiding this comment

rikhuijzer Jan 15, 2023 • edited Loading

Choose a reason for hiding this comment

ablaom left a comment

Choose a reason for hiding this comment

ablaom commented Jan 25, 2023

ablaom commented Feb 6, 2023

rikhuijzer commented Feb 7, 2023

ablaom commented Feb 7, 2023

rikhuijzer commented Feb 8, 2023

ablaom commented Feb 8, 2023

ablaom commented Feb 10, 2023 • edited Loading

salbert83 commented Jan 15, 2023 •

edited by rikhuijzer

Loading

codecov-commenter commented Jan 15, 2023 •

edited

Loading

rikhuijzer Jan 15, 2023 •

edited

Loading

ablaom commented Feb 10, 2023 •

edited

Loading