Skip to content

Reproducibility with fixed seed from R is broken #58

@jemus42

Description

@jemus42

At first I thought I stumbled into an issue with purification but it seems to be unrelated. In either case: Setting a seed before model fitting should never lead to different predictions afterwards.
The problem seems to be threading.

Purification + 1 thread

library(randomPlantedForest)
seed <- 2025
yhat <- numeric(10)

for (i in 1:10) {
  set.seed(seed)
  rpfit <- rpf(
    mpg ~ wt + cyl + hp,
    data = mtcars,
    max_interaction = 3,
    ntrees = 10,
    splits = 50,
    t_try = 0.9,
    split_try = 5,
    nthreads = 1,
    purify = TRUE
  )

  yhat[i] <- predict(rpfit, mtcars[1, ])[[1]]

  cli::cli_alert_info(
    "Iteration {.val {i}} for seed {.val {seed}}: Prediction for first instance {.val {yhat[i]}}"
  )
  if (i > 1)
    cli::cli_alert_warning("Diff from previous: {.val {yhat[i] - yhat[i-1]}}")
}
#> ℹ Iteration 1 for seed 2025: Prediction for first instance 21
#> ℹ Iteration 2 for seed 2025: Prediction for first instance 21
#> ! Diff from previous: 0
#> ℹ Iteration 3 for seed 2025: Prediction for first instance 21
#> ! Diff from previous: 0
#> ℹ Iteration 4 for seed 2025: Prediction for first instance 21
#> ! Diff from previous: 0
#> ℹ Iteration 5 for seed 2025: Prediction for first instance 21
#> ! Diff from previous: 0
#> ℹ Iteration 6 for seed 2025: Prediction for first instance 21
#> ! Diff from previous: 0
#> ℹ Iteration 7 for seed 2025: Prediction for first instance 21
#> ! Diff from previous: 0
#> ℹ Iteration 8 for seed 2025: Prediction for first instance 21
#> ! Diff from previous: 0
#> ℹ Iteration 9 for seed 2025: Prediction for first instance 21
#> ! Diff from previous: 0
#> ℹ Iteration 10 for seed 2025: Prediction for first instance 21
#> ! Diff from previous: 0

Created on 2025-03-21 with reprex v2.1.1

Purification + 3 threads

# Install current GH version
# pak::pak(PlantedML/randomPlantedForest)
library(randomPlantedForest)
seed <- 2025
yhat <- numeric(10)

for (i in 1:10) {
  set.seed(seed)
  rpfit <- rpf(
    mpg ~ wt + cyl + hp,
    data = mtcars,
    max_interaction = 3,
    ntrees = 10,
    splits = 50,
    t_try = 0.9,
    split_try = 5,
    nthreads = 3,
    purify = TRUE
  )

  yhat[i] <- predict(rpfit, mtcars[1, ])[[1]]

  cli::cli_alert_info(
    "Iteration {.val {i}} for seed {.val {seed}}: Prediction for first instance {.val {yhat[i]}}"
  )
  if (i > 1)
    cli::cli_alert_warning("Diff from previous: {.val {yhat[i] - yhat[i-1]}}")
}
#> ℹ Iteration 1 for seed 2025: Prediction for first instance 20.5
#> ℹ Iteration 2 for seed 2025: Prediction for first instance 21.12
#> ! Diff from previous: 0.619999999999994
#> ℹ Iteration 3 for seed 2025: Prediction for first instance 20.3302045177045
#> ! Diff from previous: -0.789795482295482
#> ℹ Iteration 4 for seed 2025: Prediction for first instance 20.7800000003329
#> ! Diff from previous: 0.449795482628371
#> ℹ Iteration 5 for seed 2025: Prediction for first instance 21.053988799747
#> ! Diff from previous: 0.273988799414067
#> ℹ Iteration 6 for seed 2025: Prediction for first instance 20.79
#> ! Diff from previous: -0.263988799746951
#> ℹ Iteration 7 for seed 2025: Prediction for first instance 20.7899913796953
#> ! Diff from previous: -8.62030465498265e-06
#> ℹ Iteration 8 for seed 2025: Prediction for first instance 20.6028571428571
#> ! Diff from previous: -0.187134236838208
#> ℹ Iteration 9 for seed 2025: Prediction for first instance 20.87
#> ! Diff from previous: 0.267142857142861
#> ℹ Iteration 10 for seed 2025: Prediction for first instance 20.92
#> ! Diff from previous: 0.0499999999999972

Created on 2025-03-21 with reprex v2.1.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    C++Anything referring to the underlying C++ implementation.bugSomething isn't working

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions