Skip to content

Fix TaskLocalRNG producing identical noise across processes#257

Closed
isaacsas wants to merge 3 commits intoSciML:masterfrom
isaacsas:fix_correlated_initialization_with_TaskLocalRNG
Closed

Fix TaskLocalRNG producing identical noise across processes#257
isaacsas wants to merge 3 commits intoSciML:masterfrom
isaacsas:fix_correlated_initialization_with_TaskLocalRNG

Conversation

@isaacsas
Copy link
Member

When multiple noise processes were created sequentially with the default TaskLocalRNG and no intervening rand calls, copy(rng) gave each process an identical RNG state, resulting in statistically identical samples.

Replace copy(rng) with Random.Xoshiro(rand(rng, UInt64)) so that each construction draws a seed from the task-local RNG (advancing it), giving every process an independent random stream.

Old vs. new behavior:

using DiffEqBase, DiffEqNoiseProcess

# Create two WienerProcesses back-to-back with default RNG
W1 = WienerProcess(0.0, 0.0, 0.0; reseed = false)
W2 = WienerProcess(0.0, 0.0, 0.0; reseed = false)
sol1 = solve(NoiseProblem(W1, (0.0, 1.0)); dt = 0.1)
sol2 = solve(NoiseProblem(W2, (0.0, 1.0)); dt = 0.1)

println("W1 samples: ", sol1.W)
println("W2 samples: ", sol2.W)
println("Identical?  ", sol1.W == sol2.W)

Before (v5.27.0): Identical? true — both processes produce the exact same noise samples.

After this PR: Identical? false — each process gets an independent random stream.

When multiple noise processes were created sequentially with the default
TaskLocalRNG and no intervening rand calls, `copy(rng)` gave each process
an identical RNG state, resulting in statistically identical samples.

Replace `copy(rng)` with `Random.Xoshiro(rand(rng, UInt64))` so that each
construction draws a seed from the task-local RNG (advancing it), giving
every process an independent random stream.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@isaacsas
Copy link
Member Author

CI Failure Analysis: SciMLSensitivity SDE1

The only new CI failure is in the SciMLSensitivity.jl/SDE1 downstream integration test — specifically in sde_scalar_stratonovich.jl. All other checks (Tests, QA, Runic, Docs, Spell Check, StochasticDiffEq integration, SciMLSensitivity SDE3) pass.

What's failing

11 tests fail in sde_scalar_stratonovich.jl, all isapprox checks with tight tolerances (atol=0.0001 or rtol=0.0001). The values are very close but just outside the threshold. For example (line 96):

isapprox(adjoint([0.4616603904455946, 0.2631393502443631]),
         adjoint([0.4616685811848739, 0.26326526195201155]); rtol = 0.0001)

The relative difference here is ~0.05%, just over the 0.01% threshold.

Why this happens

This PR changes how RNGs are created for noise processes — instead of copy(TaskLocalRNG) (which copies the exact state), we now create a new Xoshiro seeded from the task-local RNG. This means the noise wrapper used during the SDE adjoint pass gets a different random stream than before. The adjoint gradient computations are still correct, but the slightly different noise realization shifts the numerical results enough to exceed these tight tolerances.

Suggested fix in SciMLSensitivity

The tolerances in sde_scalar_stratonovich.jl could either be loosened slightly (e.g., rtol=0.001 instead of rtol=0.0001), or the tests could use more samples / finer time steps to reduce variance in the gradient estimates. These tests are comparing different adjoint methods against each other, and the level of agreement depends on the specific noise realization — a tolerance of 0.0001 is fragile to any upstream change in RNG seeding.

@isaacsas
Copy link
Member Author

@ChrisRackauckas I think this is good on my end. The SciMLSensitivty test seems like it needs updating to use more samples or a weaker tolerance as it is apparently borderline on having enough statistical samples.

isaacsas and others added 2 commits February 24, 2026 11:17
Remove the TaskLocalRNG -> Xoshiro conversion at construction time,
reverting to the original behavior where all noise processes share
the task-local RNG when no explicit rng is provided.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
The shared TaskLocalRNG singleton naturally produces independent noise
across processes since each solve advances the shared state.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@isaacsas
Copy link
Member Author

OK, I reverted that copy but kept the test for distinct streams, which seems useful anyways. Let's see if tests pass now.

@isaacsas
Copy link
Member Author

Looks like it fails as there are one or more places that try to explicitly copy the rng and that causes the type to change with TaskLocalRNG (as it concretizes a Xoshiro). I don't know the motivation for such copies well enough to be comfortable messing with that code.

I'll just close this and open a bug report. Hopefully someone who knows the code base better can fix this bug.

@isaacsas isaacsas closed this Feb 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants