Skip to content

Conversation

@MasonProtter
Copy link
Contributor

@MasonProtter MasonProtter commented Dec 21, 2025

(I wrote this with assistance from Gemini since I'm not very used to writing LLVM IR)

This is an attempt to fix #60441. After bumbling around a bit, it seems that the problem is that invoke(f, ::CodeInstance, args...) calls are not turned into Expr(:invoke statements in the IR, but remains as :calls to the invoke builtin which ends up going through the runtime.

There's probably a better way to do this, but the way I found was to just detect these builtin calls in the LLVM IR, and send them to emit_invoke.

It appears to resolve the issue:

using BenchmarkTools
using Compiler
using Core.IR

struct SplitCacheOwner; end
struct SplitCacheInterp <: Compiler.AbstractInterpreter
    world::UInt
    inf_params::Compiler.InferenceParams
    opt_params::Compiler.OptimizationParams
    inf_cache::Vector{Compiler.InferenceResult}
    codegen_cache::IdDict{CodeInstance,CodeInfo}
    function SplitCacheInterp(;
        world::UInt = Base.get_world_counter(),
        inf_params::Compiler.InferenceParams = Compiler.InferenceParams(),
        opt_params::Compiler.OptimizationParams = Compiler.OptimizationParams(),
        inf_cache::Vector{Compiler.InferenceResult} = Compiler.InferenceResult[])
        new(world, inf_params, opt_params, inf_cache, IdDict{CodeInstance,CodeInfo}())
    end
end

Compiler.InferenceParams(interp::SplitCacheInterp) = interp.inf_params
Compiler.OptimizationParams(interp::SplitCacheInterp) = interp.opt_params
Compiler.get_inference_world(interp::SplitCacheInterp) = interp.world
Compiler.get_inference_cache(interp::SplitCacheInterp) = interp.inf_cache
Compiler.cache_owner(::SplitCacheInterp) = SplitCacheOwner()
Compiler.codegen_cache(interp::SplitCacheInterp) = interp.codegen_cache

import Core.OptimizedGenerics.CompilerPlugins: typeinf, typeinf_edge
@eval @noinline typeinf(::SplitCacheOwner, mi::MethodInstance, source_mode::UInt8) =
    Base.invoke_in_world(which(typeinf, Tuple{SplitCacheOwner, MethodInstance, UInt8}).primary_world, Compiler.typeinf_ext_toplevel, SplitCacheInterp(; world=Base.tls_world_age()), mi, source_mode)

const cinst = let world = Base.get_world_counter()
    sig = Tuple{typeof(sin), Float64}
    method_table = nothing
    mi = @ccall jl_method_lookup_by_tt(sig::Any, world::Csize_t, method_table::Any)::Any
    Compiler.typeinf_ext_toplevel(SplitCacheInterp(; world), mi, Compiler.SOURCE_MODE_ABI)
end

Before this PR:

julia> @btime invoke(sin, cinst, x) setup=(x=rand())
  118.912 ns (2 allocations: 32 bytes)
0.803755305547964

After this PR:

julia> @btime invoke(sin, cinst, x) setup=(x=rand())
  3.356 ns (0 allocations: 0 bytes)
0.8305566373064891

@gbaraldi
Copy link
Member

Does this/should this do a world age check?
What is the difference between this and calling an opaque closure?
Could this potentially be used to make a fake codeinst, put an invoke pointer there and call that random invoke pointer?

@gbaraldi gbaraldi requested review from Keno and vtjnash January 16, 2026 14:26
@MasonProtter
Copy link
Contributor Author

What is the difference between this and calling an opaque closure?
Could this potentially be used to make a fake codeinst, put an invoke pointer there and call that random invoke pointer?

So just to be clear, I am not proposing introducing a new mechanism here. The new mechanism was already added for v1.12 in #56660. The only thing this PR does is make it so that the mechanism isn't slow to call.

I don't really know what one can or cannot do by messing around with the invoke pointer, presumably arbitrary things, but I'm not sure. Regarding a comparison to opaque closures, I think the idea is that these codeinstances are a bit more 'static', rather than combining runtime data with alternative interpretation. Another difference is that a CodeInstance is just the natural object that running the compiler pipeline on an alternative AbstractInterpreter produces, not an opaque closure. So I think encouraging people to use these instead of opaque closures allows one to operate a bit more closely to how to the compiler works.

I didn't make the feature though, that was @Keno, so maybe he can motivate it a bit more if the linked PR and my crappy explanation isn't sufficient.

Does this/should this do a world age check?

I had initially assumed that emit_invoke would do the worldage check,but I'm not actually seeing it in there, so I guess I probably should add it into here, since the version in the runtime does do a worldage check.

src/codegen.cpp Outdated
}

// 3. Delegate to emit_invoke for the actual call generation
*ret = emit_invoke(ctx, argtypes, invoke_args, invoke_args.size(), rt, false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably need to emit_invoke to match this error

if (invoke) {
            return invoke(args[0], &args[2], nargs - 2, codeinst);
        } else {
            if (codeinst->owner != jl_nothing || !jl_is_method(codeinst->def->def.value)) {
                jl_error("Failed to invoke or compile external codeinst");
            }
            return jl_gf_invoke_by_method(codeinst->def->def.method, args[0], &args[2], nargs - 1);

Similarly looking at https://github.com/JuliaLang/julia/pull/56660/changes#diff-b6cb5d410b973b5987bc13b1eba0f156fcaece59cfbd6cb12ac503ff44fd13ca

What happens when you provide an "uncompiled" codeinst? You would need a mechanism like in #52964 to go from codeinst->owner to the correct abstract interpreter / compiler instance.

So one avenue forward would be to partially do #52964 but for now error on encountering a dynamic dispatch.

Copy link
Contributor Author

@MasonProtter MasonProtter Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens when you provide an "uncompiled" codeinst? You would need a mechanism like in #52964 to go from codeinst->owner to the correct abstract interpreter / compiler instance.

Do we need to worry about that for this PR? I'd really just like to at least get this mechanism working rather than re-design the mechanism (not saying I'm against a redesign or enhancement, but I'd just like to keep this PR focussed on the narrow issue I want to solve: calling overhead if it's possible)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might be fine to error on (but should be checked)
Your code should replicate whatever the current code does, with potentially deciding some things at compile time

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well you code should match the behavior and emit the same errors as the interpreter/builtin version.

Which has a:

        if (!invoke) {
            jl_compile_codeinst(codeinst);

@MasonProtter
Copy link
Contributor Author

MasonProtter commented Jan 16, 2026

For the world-age check, how do we do that? It seems that the ctx has a min worldage and a max worldage, not a specific worldage. Do I just check that this overlaps with the min and max worldages of the CodeInstance? Or does the worldage check need to be a runtime check rather than compile-time?

@MasonProtter
Copy link
Contributor Author

Okay, I have

  • added some basic checks making sure that .invoke is non-null. If it is null, we try to compile it like the interpreter does, and if it's still null after that, we just let the interpreter handle stuff like errors
  • checked that the ctx worldage fits within the valid worldages for the codeinst, if not I simply pass it off to the interpreter to handle errors
  • I added some basic tests that one can invoke a constant codeinstnace without allocations
  • I made Compiler's tests run CompilerDevTools's tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Allocations when invoke-ing a constant CodeInstance

4 participants