Skip to content

Commit 987c447

Browse files
Dev (#319)
* Parallelize GPU split finding over (node, feature) pairs Replaces sequential feature iteration with parallel (node × feature) kernel, adds separate nodes_sum precomputation and GPU-side best split reduction. Signed-off-by: AdityaPandeyCN <[email protected]> * remove unrelated changes Signed-off-by: AdityaPandeyCN <[email protected]> * remove inconsistencies Signed-off-by: AdityaPandeyCN <[email protected]> * bench * comments Signed-off-by: AdityaPandeyCN <[email protected]> * up * up * refactor(GPU): extract loss-specialized operators from find_best_split_parallel_kernel - Add parent_gain(), split_gain(), check_monotone() with per-loss dispatch - Introduce SplitStats struct for clean parameter passing - Add accumulate_hist_k1/kn! helpers for histogram accumulation - Simplify update_hist_gpu! signature (remove unused params) - Improve maintainability and extensibility for new loss types Signed-off-by: AdityaPandeyCN <[email protected]> * bench * ref bench * up * refactor split-finding, fix GPU IR for MAE/Quantile/Cred, and clean up docs/names Signed-off-by: AdityaPandeyCN <[email protected]> * up * up * fix colsample split issue by summing node totals over sampled features Signed-off-by: AdityaPandeyCN <[email protected]> * up * benchmarks * up * up * up * up * bench * bump version * up --------- Signed-off-by: AdityaPandeyCN <[email protected]> Co-authored-by: AdityaPandeyCN <[email protected]>
1 parent 97840b9 commit 987c447

File tree

4 files changed

+37
-37
lines changed

4 files changed

+37
-37
lines changed

README.md

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -55,18 +55,18 @@ Code to reproduce is available in [`benchmarks/regressor.jl`](https://github.com
5555

5656
| **nobs** | **nfeats** | **max\_depth** | **train\_evo** | **infer\_evo** | **train\_xgb** | **infer\_xgb** |
5757
|:--------:|:----------:|:--------------:|:--------------:|:--------------:|:--------------:|:--------------:|
58-
| 100k | 10 | 6 | 0.35 | 0.06 | 0.20 | 0.03 |
59-
| 100k | 10 | 11 | 1.28 | 0.08 | 0.56 | 0.04 |
60-
| 100k | 100 | 6 | 0.79 | 0.08 | 0.77 | 0.03 |
61-
| 100k | 100 | 11 | 4.85 | 0.10 | 2.37 | 0.05 |
62-
| 1M | 10 | 6 | 2.56 | 0.44 | 1.39 | 0.17 |
63-
| 1M | 10 | 11 | 5.13 | 0.64 | 2.68 | 0.43 |
64-
| 1M | 100 | 6 | 5.84 | 0.63 | 4.84 | 0.18 |
65-
| 1M | 100 | 11 | 18.62 | 1.21 | 10.08 | 0.45 |
66-
| 10M | 10 | 6 | 25.97 | 3.19 | 27.88 | 1.72 |
67-
| 10M | 10 | 11 | 51.73 | 6.83 | 54.51 | 4.31 |
68-
| 10M | 100 | 6 | 83.96 | 6.04 | 63.92 | 1.84 |
69-
| 10M | 100 | 11 | 195.02 | 11.86 | 105.87 | 4.60 |
58+
| 100k | 10 | 6 | 0.36 | 0.06 | 0.21 | 0.03 |
59+
| 100k | 10 | 11 | 1.28 | 0.08 | 0.63 | 0.06 |
60+
| 100k | 100 | 6 | 0.79 | 0.08 | 0.79 | 0.03 |
61+
| 100k | 100 | 11 | 4.91 | 0.12 | 3.67 | 0.07 |
62+
| 1M | 10 | 6 | 2.49 | 0.31 | 1.60 | 0.24 |
63+
| 1M | 10 | 11 | 5.07 | 0.63 | 3.16 | 0.58 |
64+
| 1M | 100 | 6 | 5.82 | 0.69 | 5.53 | 0.26 |
65+
| 1M | 100 | 11 | 18.78 | 1.19 | 13.40 | 0.57 |
66+
| 10M | 10 | 6 | 26.45 | 3.34 | 30.99 | 1.76 |
67+
| 10M | 10 | 11 | 51.88 | 6.27 | 55.20 | 5.57 |
68+
| 10M | 100 | 6 | 85.05 | 6.44 | 65.90 | 2.56 |
69+
| 10M | 100 | 11 | 192.58 | 12.18 | 111.69 | 6.02 |
7070

7171
### GPU:
7272

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,13 @@
11
device,nobs,nfeats,max_depth,train_xgb,infer_xgb
2-
cpu,100000,10,5,0.198134027,0.027641675
3-
cpu,100000,10,10,0.563073942,0.044171123
4-
cpu,100000,100,5,0.770315637,0.029242415
5-
cpu,100000,100,10,2.367362958,0.045144129
6-
cpu,1000000,10,5,1.391768222,0.16572768
7-
cpu,1000000,10,10,2.684452643,0.425445288
8-
cpu,1000000,100,5,4.844459249,0.182353516
9-
cpu,1000000,100,10,10.076365726,0.448196457
10-
cpu,10000000,10,5,27.87760152,1.716592639
11-
cpu,10000000,10,10,54.506897053,4.306913473
12-
cpu,10000000,100,5,63.920821461,1.84436349
13-
cpu,10000000,100,10,105.873803709,4.596105019
2+
cpu,100000,10,5,0.209685012,0.0278843
3+
cpu,100000,10,10,0.633892017,0.063379928
4+
cpu,100000,100,5,0.78803545,0.029714044
5+
cpu,100000,100,10,3.67106006,0.068177148
6+
cpu,1000000,10,5,1.60418603,0.239437106
7+
cpu,1000000,10,10,3.15885802,0.581395282
8+
cpu,1000000,100,5,5.532403806,0.259958804
9+
cpu,1000000,100,10,13.400030569,0.565819034
10+
cpu,10000000,10,5,30.989342834,1.761798322
11+
cpu,10000000,10,10,55.19955822,5.572772669
12+
cpu,10000000,100,5,65.896419608,2.555485163
13+
cpu,10000000,100,10,111.690241223,6.021324837

benchmarks/results/md-tables.jl

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ using DataFrames
33
using PrettyTables
44
using Format: format
55

6-
device = "gpu"
6+
device = "cpu"
77

88
df = CSV.read(joinpath(@__DIR__, "regressor-$device.csv"), DataFrame)
99
df = df[:, Cols(Not(:device))]
Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,13 @@
11
device,nobs,nfeats,max_depth,train_evo,infer_evo
2-
cpu,100000,10,6,0.348727684,0.060244452
3-
cpu,100000,10,11,1.283299354,0.083215669
4-
cpu,100000,100,6,0.786920518,0.075702292
5-
cpu,100000,100,11,4.850429477,0.104744732
6-
cpu,1000000,10,6,2.557501335,0.435545273
7-
cpu,1000000,10,11,5.134330812,0.64089967
8-
cpu,1000000,100,6,5.841063335,0.630282476
9-
cpu,1000000,100,11,18.618983816,1.209284245
10-
cpu,10000000,10,6,25.967656236,3.187047612
11-
cpu,10000000,10,11,51.733783915,6.827839913
12-
cpu,10000000,100,6,83.964218374,6.040076107
13-
cpu,10000000,100,11,195.02288577,11.855500829
2+
cpu,100000,10,6,0.36101701,0.058529294
3+
cpu,100000,10,11,1.275312101,0.083774523
4+
cpu,100000,100,6,0.788167256,0.075987929
5+
cpu,100000,100,11,4.909140577,0.122065872
6+
cpu,1000000,10,6,2.490242757,0.312055729
7+
cpu,1000000,10,11,5.068538738,0.633276075
8+
cpu,1000000,100,6,5.824611015,0.694221266
9+
cpu,1000000,100,11,18.77638246,1.188228807
10+
cpu,10000000,10,6,26.454397741,3.335762904
11+
cpu,10000000,10,11,51.877259072,6.274066148
12+
cpu,10000000,100,6,85.05257602,6.441238079
13+
cpu,10000000,100,11,192.576912071,12.181943399

0 commit comments

Comments
 (0)