TTFX experiments #4210

SebastianM-C · 2026-01-25T01:42:22Z

I've been experimenting a bit with some possible TTFX improvements.
In my tests these changes lead to an ~18% improvement in the latency for ODEProblem, wich is the part that has the largest latency.

Phase	Released MTK (avg)	Local MTK (avg)	Change
System instantiation	0.280s ± 0.027s	0.264s ± 0.002s	-5.7%
mtkcompile	0.227s ± 0.035s	0.207s ± 0.002s	-8.8%
ODEProblem creation	13.813s ± 0.060s	11.246s ± 0.074s	-18.6%
solve	0.840s ± 0.018s	0.879s ± 0.006s	+4.6%
Total (excl. pkg load)	15.160s	12.596s	-16.9%

Allocations:

Phase	Released MTK	Local MTK	Change
System	528.59k / 32.2 MiB	528.57k / 32.2 MiB	~0%
mtkcompile	323.91k / 32.6 MiB	343.40k / 33.4 MiB	+6%
ODEProblem	46.37M / 2.26 GiB	36.04M / 1.77 GiB	-22%
solve	6.52M / 329 MiB	6.52M / 329 MiB	0%

Going a bit more into details, I lookd at

println("[0] Package loading...")
@time begin
    using ModelingToolkit
    using OrdinaryDiffEqTsit5
end

using ModelingToolkit: SymbolicT
using SciMLBase: FullSpecialize

ModelingToolkit.@component function LotkaVolterra(; name, α = 1.3, β = 0.9, γ = 0.8, δ = 1.8)
    ModelingToolkit.@parameters begin
        (α::Float64 = α)
        (β::Float64 = β)
        (γ::Float64 = γ)
        (δ::Float64 = δ)
    end
    params = SymbolicT[]
    push!(params, α)
    push!(params, β)
    push!(params, γ)
    push!(params, δ)

    ModelingToolkit.@variables begin
        x(ModelingToolkit.t_nounits)
        y(ModelingToolkit.t_nounits)
    end
    vars = SymbolicT[]
    push!(vars, x)
    push!(vars, y)

    initial_conditions = Dict{SymbolicT, SymbolicT}()
    push!(initial_conditions, x => (3.1))
    push!(initial_conditions, y => (1.5))

    guesses = Dict{SymbolicT, SymbolicT}()

    eqs = ModelingToolkit.Equation[]
    push!(eqs, ModelingToolkit.D_nounits(x) ~ α * x - β * x * y)
    push!(eqs, ModelingToolkit.D_nounits(y) ~ -δ * y + γ * x * y)

    return ModelingToolkit.System(eqs, ModelingToolkit.t_nounits, vars, params;
        systems = ModelingToolkit.System[], initial_conditions, guesses, name)
end

function mwe()
    # Phase 1: System instantiation
    println("\n[1] System instantiation:")
    sys = @time LotkaVolterra(; name=:lv)

    # Phase 2: Structural simplification (mtkcompile)
    println("\n[2] mtkcompile (structural simplification):")
    simplified = @time mtkcompile(sys)

    # Phase 3: ODEProblem creation
    println("\n[3] ODEProblem creation:")
    tspan = (0.0, 10.0)
    prob = @time ODEProblem{true, FullSpecialize}(simplified, [], tspan; fully_determined=true)

    # Phase 4: solve
    println("\n[4] solve:")
    sol = @time solve(prob, Tsit5());

    println("\n[5] Runtime performance (min of 1000 solves):")
    times = Vector{Float64}(undef, 1000)
    allocs = Vector{Int64}(undef, 1000)
    bytes = Vector{Int64}(undef, 1000)
    for i in 1:1000
        t = @timed solve(prob, Tsit5())
        times[i] = t.time
        allocs[i] = Base.gc_alloc_count(t.gcstats)
        bytes[i] = t.bytes
    end
    min_idx = argmin(times)
    min_time = times[min_idx]
    median_time = sort(times)[500]
    min_allocs = allocs[min_idx]
    min_bytes = bytes[min_idx]
    println("  min: $(round(min_time * 1e6, digits=1)) μs, median: $(round(median_time * 1e6, digits=1)) μs")
    println("  allocations: $(min_allocs), memory: $(round(min_bytes / 1024, digits=1)) KiB")
end

mwe()

and profiled this with SnoopCompile and with Tracy. For the SnoopCompile case, though I noticed that the timings are much larger that what I get from @time, so the inference times are a bit unreliable. I used that for the first commit, but I think we can drop it since the impact is minimal.

| Change Set | System | ODEProblem | Total Improvement |
|------------|--------|------------|-------------------|
| Lazy strings only | -9.6% (26ms) | -0.4% | **-0.2%** (34ms) |
| @nospecialize | -39.9% (116ms) | -9.0% (1.37s) | **-8.2%** (1.71s) |

The nospecialize changes in the above table are the second commit.

Based on the Tracy profiling the main issue is that we codegen a lot of methods

One thing to note is that we have a huge inference zone at the start, corresponding to the ODEProblem constructor. If we can somehow reduce the number of inference triggers or reduce the number of inferred methods, we can lower the JIT time since we'll have to compile less code. As I understand the main complexity is the recursive nature and the initialization system.

I benchmarked the solve after this changes and runtime performance doesn't seem affected, but this is only one case. I'm not sure if things like remakes during an optimization problem loop would be affected.

Checklist

Appropriate tests were added
Any code changes were done in a way that does not break public API
All documentation related to code changes were updated
The new code follows the
contributor guidelines, in particular the SciML Style Guide and
COLPRAC.
Any new documentation only uses public API

Additional context

Add any other context about the problem here.

String interpolation in the error paths leads to inference triggers for show methods, which impacts ttfx. By using lazy strings instead we defer the compilation to when we actually error. Co-Authored-By: Claude Opus 4.5 <[email protected]>

Add `@nospecializeinfer` and `@nospecialize` to callback and codegen functions to reduce type inference overhead: - `build_function_wrapper`: nospecialize expr and args - `compile_condition`: nospecialize cbs, dvs, ps - `compile_equational_affect`: nospecialize aff, op Also replace generator comprehensions with explicit loops in `generate_continuous_callbacks` and `generate_discrete_callbacks` to avoid inference specialization on complex Generator types. These changes reduce ODEProblem creation time by ~9% and solve time by ~17% in TTFX benchmarks. Co-Authored-By: Claude Opus 4.5 <[email protected]>

Add `@nospecializeinfer` and `@nospecialize` annotations to reduce type inference overhead in problem construction: **ODEFunction constructor:** - `@nospecialize` on: `u0`, `p`, `analytic`, `initialization_data` **ODEProblem constructor:** - `@nospecialize` on: `op`, `callback` **@fallback_iip_specialize macro:** - Add `_unwrap_nospecialize()` helper to extract arguments from `@nospecialize` wrappers - Create `sig_args` with annotations stripped for fallback signatures - Handle both `:kw` and `:(=)` syntax inside macrocalls Co-Authored-By: Claude Opus 4.5 <[email protected]>

AayushSabharwal · 2026-01-27T05:26:16Z

That's pretty interesting. Yeah, we can probably get a whole bunch of TTFX improvements here if we stop specializing on generated functions. Each RGF has its own type.

SebastianM-C and others added 3 commits January 22, 2026 20:41

refactor: use lazy strings on error paths

3fb00df

String interpolation in the error paths leads to inference triggers for show methods, which impacts ttfx. By using lazy strings instead we defer the compilation to when we actually error. Co-Authored-By: Claude Opus 4.5 <[email protected]>

SebastianM-C changed the title ~~Smc/ttfx~~ TTFX experiments Jan 25, 2026

SebastianM-C marked this pull request as ready for review January 29, 2026 22:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

TTFX experiments #4210

TTFX experiments #4210

Uh oh!

SebastianM-C commented Jan 25, 2026 •

edited

Loading

Uh oh!

AayushSabharwal commented Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

TTFX experiments #4210

Are you sure you want to change the base?

TTFX experiments #4210

Uh oh!

Conversation

SebastianM-C commented Jan 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist

Additional context

Uh oh!

AayushSabharwal commented Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

SebastianM-C commented Jan 25, 2026 •

edited

Loading