Skip to content

Conversation

@SebastianM-C
Copy link
Member

@SebastianM-C SebastianM-C commented Jan 25, 2026

I've been experimenting a bit with some possible TTFX improvements.
In my tests these changes lead to an ~18% improvement in the latency for ODEProblem, wich is the part that has the largest latency.

Phase Released MTK (avg) Local MTK (avg) Change
System instantiation 0.280s ± 0.027s 0.264s ± 0.002s -5.7%
mtkcompile 0.227s ± 0.035s 0.207s ± 0.002s -8.8%
ODEProblem creation 13.813s ± 0.060s 11.246s ± 0.074s -18.6%
solve 0.840s ± 0.018s 0.879s ± 0.006s +4.6%
Total (excl. pkg load) 15.160s 12.596s -16.9%

Allocations:

Phase Released MTK Local MTK Change
System 528.59k / 32.2 MiB 528.57k / 32.2 MiB ~0%
mtkcompile 323.91k / 32.6 MiB 343.40k / 33.4 MiB +6%
ODEProblem 46.37M / 2.26 GiB 36.04M / 1.77 GiB -22%
solve 6.52M / 329 MiB 6.52M / 329 MiB 0%

Going a bit more into details, I lookd at

println("[0] Package loading...")
@time begin
    using ModelingToolkit
    using OrdinaryDiffEqTsit5
end

using ModelingToolkit: SymbolicT
using SciMLBase: FullSpecialize

ModelingToolkit.@component function LotkaVolterra(; name, α = 1.3, β = 0.9, γ = 0.8, δ = 1.8)
    ModelingToolkit.@parameters begin
        (α::Float64 = α)
        (β::Float64 = β)
        (γ::Float64 = γ)
        (δ::Float64 = δ)
    end
    params = SymbolicT[]
    push!(params, α)
    push!(params, β)
    push!(params, γ)
    push!(params, δ)

    ModelingToolkit.@variables begin
        x(ModelingToolkit.t_nounits)
        y(ModelingToolkit.t_nounits)
    end
    vars = SymbolicT[]
    push!(vars, x)
    push!(vars, y)

    initial_conditions = Dict{SymbolicT, SymbolicT}()
    push!(initial_conditions, x => (3.1))
    push!(initial_conditions, y => (1.5))

    guesses = Dict{SymbolicT, SymbolicT}()

    eqs = ModelingToolkit.Equation[]
    push!(eqs, ModelingToolkit.D_nounits(x) ~ α * x - β * x * y)
    push!(eqs, ModelingToolkit.D_nounits(y) ~ -δ * y + γ * x * y)

    return ModelingToolkit.System(eqs, ModelingToolkit.t_nounits, vars, params;
        systems = ModelingToolkit.System[], initial_conditions, guesses, name)
end

function mwe()
    # Phase 1: System instantiation
    println("\n[1] System instantiation:")
    sys = @time LotkaVolterra(; name=:lv)

    # Phase 2: Structural simplification (mtkcompile)
    println("\n[2] mtkcompile (structural simplification):")
    simplified = @time mtkcompile(sys)

    # Phase 3: ODEProblem creation
    println("\n[3] ODEProblem creation:")
    tspan = (0.0, 10.0)
    prob = @time ODEProblem{true, FullSpecialize}(simplified, [], tspan; fully_determined=true)

    # Phase 4: solve
    println("\n[4] solve:")
    sol = @time solve(prob, Tsit5());

    println("\n[5] Runtime performance (min of 1000 solves):")
    times = Vector{Float64}(undef, 1000)
    allocs = Vector{Int64}(undef, 1000)
    bytes = Vector{Int64}(undef, 1000)
    for i in 1:1000
        t = @timed solve(prob, Tsit5())
        times[i] = t.time
        allocs[i] = Base.gc_alloc_count(t.gcstats)
        bytes[i] = t.bytes
    end
    min_idx = argmin(times)
    min_time = times[min_idx]
    median_time = sort(times)[500]
    min_allocs = allocs[min_idx]
    min_bytes = bytes[min_idx]
    println("  min: $(round(min_time * 1e6, digits=1)) μs, median: $(round(median_time * 1e6, digits=1)) μs")
    println("  allocations: $(min_allocs), memory: $(round(min_bytes / 1024, digits=1)) KiB")
end

mwe()

and profiled this with SnoopCompile and with Tracy. For the SnoopCompile case, though I noticed that the timings are much larger that what I get from @time, so the inference times are a bit unreliable. I used that for the first commit, but I think we can drop it since the impact is minimal.

| Change Set | System | ODEProblem | Total Improvement |
|------------|--------|------------|-------------------|
| Lazy strings only | -9.6% (26ms) | -0.4% | **-0.2%** (34ms) |
| @nospecialize | -39.9% (116ms) | -9.0% (1.37s) | **-8.2%** (1.71s) |

The nospecialize changes in the above table are the second commit.

Based on the Tracy profiling the main issue is that we codegen a lot of methods

Screenshot_20260122_004025 Screenshot_20260122_015724

One thing to note is that we have a huge inference zone at the start, corresponding to the ODEProblem constructor. If we can somehow reduce the number of inference triggers or reduce the number of inferred methods, we can lower the JIT time since we'll have to compile less code. As I understand the main complexity is the recursive nature and the initialization system.

I benchmarked the solve after this changes and runtime performance doesn't seem affected, but this is only one case. I'm not sure if things like remakes during an optimization problem loop would be affected.

Checklist

  • Appropriate tests were added
  • Any code changes were done in a way that does not break public API
  • All documentation related to code changes were updated
  • The new code follows the
    contributor guidelines, in particular the SciML Style Guide and
    COLPRAC.
  • Any new documentation only uses public API

Additional context

Add any other context about the problem here.

SebastianM-C and others added 3 commits January 22, 2026 20:41
String interpolation in the error paths
leads to inference triggers for show methods, which impacts ttfx.
By using lazy strings instead we defer the compilation to when we actually error.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Add `@nospecializeinfer` and `@nospecialize` to callback and codegen
functions to reduce type inference overhead:

- `build_function_wrapper`: nospecialize expr and args
- `compile_condition`: nospecialize cbs, dvs, ps
- `compile_equational_affect`: nospecialize aff, op

Also replace generator comprehensions with explicit loops in
`generate_continuous_callbacks` and `generate_discrete_callbacks`
to avoid inference specialization on complex Generator types.

These changes reduce ODEProblem creation time by ~9% and solve time
by ~17% in TTFX benchmarks.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Add `@nospecializeinfer` and `@nospecialize` annotations to reduce
type inference overhead in problem construction:

**ODEFunction constructor:**
- `@nospecialize` on: `u0`, `p`, `analytic`, `initialization_data`

**ODEProblem constructor:**
- `@nospecialize` on: `op`, `callback`

**@fallback_iip_specialize macro:**
- Add `_unwrap_nospecialize()` helper to extract arguments from
  `@nospecialize` wrappers
- Create `sig_args` with annotations stripped for fallback signatures
- Handle both `:kw` and `:(=)` syntax inside macrocalls

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@SebastianM-C SebastianM-C changed the title Smc/ttfx TTFX experiments Jan 25, 2026
@AayushSabharwal
Copy link
Member

That's pretty interesting. Yeah, we can probably get a whole bunch of TTFX improvements here if we stop specializing on generated functions. Each RGF has its own type.

@SebastianM-C SebastianM-C marked this pull request as ready for review January 29, 2026 22:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants