Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce generated functions: getindex #28

Open
wants to merge 16 commits into
base: main
Choose a base branch
from
Open

Conversation

NHDaly
Copy link
Member

@NHDaly NHDaly commented May 8, 2024

Convert all generated functions to regular function.

  • The produced code is mostly unchanged, and the perf remains the mostly the same.
  • Type stability is tested by a new testitem

Compilation time comparisons:

julia> test_getproperty1(b) = b.e
test_getproperty1 (generic function with 1 method)

julia> @time test_getproperty1(bar)      # BEFORE
  0.039638 seconds (36.95 k allocations: 2.471 MiB)
Blob{Blob{Quux}}(Ptr{Nothing} @0x000000013f2d0ea0, 41, 361)

julia> @time test_getproperty1(bar)      # AFTER
  0.007395 seconds (15.06 k allocations: 1019.320 KiB)
Blob{Blob{Quux}}(Ptr{Nothing} @0x000000013f2d0ea0, 41, 361)

julia> @time unsafe_load(bar)      # BEFORE
  0.076596 seconds (86.41 k allocations: 5.710 MiB)
Bar(10, Bool[0, 0, 0, 0, 0, 0, 0, 0, 0, 0], false, [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], Blob{Quux}(Ptr{Nothing} @0x000000013f2d0ea0, 217, 361))

julia> @time unsafe_load(bar)      # AFTER
  0.025392 seconds (78.95 k allocations: 5.357 MiB)
Bar(10, Bool[0, 0, 0, 0, 0, 0, 0, 0, 0, 0], false, [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], Blob{Quux}(Ptr{Nothing} @0x000000013f2d0ea0, 217, 361))

julia> @time unsafe_store!(bar, bar_val)      # BEFORE
  0.076252 seconds (96.31 k allocations: 6.358 MiB)
Bar(10, Bool[0, 0, 0, 0, 0, 0, 0, 0, 0, 0], false, [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], Blob{Quux}(Ptr{Nothing} @0x000000013f2d0ea0, 217, 361))

julia> @time unsafe_store!(bar, bar_val)      # AFTER
  0.039480 seconds (49.01 k allocations: 3.199 MiB)
Bar(10, Bool[0, 0, 0, 0, 0, 0, 0, 0, 0, 0], false, [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], Blob{Quux}(Ptr{Nothing} @0x000000013f2d0ea0, 217, 361))

Runtime comparisons on julia 1.10:

julia> test_getproperty1(b) = b.e
test_getproperty1 (generic function with 1 method)

julia> @btime test_getproperty1($bar)      # BEFORE
  1.416 ns (0 allocations: 0 bytes)
Blob{Blob{Quux}}(Ptr{Nothing} @0x000000013f2d0ea0, 41, 361)

julia> @btime test_getproperty1($bar)      # AFTER
  1.416 ns (0 allocations: 0 bytes)
Blob{Blob{Quux}}(Ptr{Nothing} @0x000000013f2d0ea0, 41, 361)

julia> @btime unsafe_load($bar)      # BEFORE
  2.166 ns (0 allocations: 0 bytes)
Bar(10, Bool[0, 0, 0, 0, 0, 0, 0, 0, 0, 0], false, [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], Blob{Quux}(Ptr{Nothing} @0x000000013f2d0ea0, 217, 361))

julia> @btime unsafe_load($bar)      # AFTER
  5.250 ns (0 allocations: 0 bytes)
Bar(10, Bool[0, 0, 0, 0, 0, 0, 0, 0, 0, 0], false, [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], Blob{Quux}(Ptr{Nothing} @0x000000013f2d0ea0, 217, 361))

julia> @btime unsafe_store!($bar, $bar_val)      # BEFORE
  5.250 ns (0 allocations: 0 bytes)
Bar(10, Bool[0, 0, 0, 0, 0, 0, 0, 0, 0, 0], false, [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], Blob{Quux}(Ptr{Nothing} @0x000000013f2d0ea0, 217, 361))

julia> @btime unsafe_store!($bar, $bar_val)      # AFTER
  10.177 ns (0 allocations: 0 bytes)
Bar(10, Bool[0, 0, 0, 0, 0, 0, 0, 0, 0, 0], false, [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], Blob{Quux}(Ptr{Nothing} @0x000000013f2d0ea0, 217, 361))

Runtime comparisons on julia 1.11:

julia> test_getproperty1(b) = b.e
test_getproperty1 (generic function with 1 method)

julia> @btime test_getproperty1($bar)      # BEFORE
  2.042 ns (0 allocations: 0 bytes)
Blob{Blob{Quux}}(Ptr{Nothing} @0x000000013f959380, 41, 361)

julia> @btime test_getproperty1($bar)      # AFTER
  2.000 ns (0 allocations: 0 bytes)
Blob{Blob{Quux}}(Ptr{Nothing} @0x000000013f959380, 41, 361)

julia> @btime unsafe_load($bar)      # BEFORE
  2.333 ns (0 allocations: 0 bytes)
Bar(0, Bool[0, 0, 0, 0, 0, 0, 0, 0, 0, 0], false, [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], Blob{Quux}(Ptr{Nothing} @0x000000013f959380, 217, 361))

julia> @btime unsafe_load($bar)      # AFTER
  2.292 ns (0 allocations: 0 bytes)
Bar(0, Bool[0, 0, 0, 0, 0, 0, 0, 0, 0, 0], false, [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], Blob{Quux}(Ptr{Nothing} @0x000000013f959380, 217, 361))

julia> @btime unsafe_store!($bar, $bar_val)      # BEFORE
  5.250 ns (0 allocations: 0 bytes)
Bar(0, Bool[0, 0, 0, 0, 0, 0, 0, 0, 0, 0], false, [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], Blob{Quux}(Ptr{Nothing} @0x000000013f959380, 217, 361))

julia> @btime unsafe_store!($bar, $bar_val)      # AFTER
  5.333 ns (0 allocations: 0 bytes)
Bar(0, Bool[0, 0, 0, 0, 0, 0, 0, 0, 0, 0], false, [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], Blob{Quux}(Ptr{Nothing} @0x000000013f959380, 217, 361))

NHDaly added 2 commits May 8, 2024 10:10
- The produced code is unchanged, and the perf remains the same.
- This is tested by a new testitem
functions, but these had code-size and perf impacts. :(
src/blob.jl Outdated Show resolved Hide resolved
src/blob.jl Outdated Show resolved Hide resolved
src/blob.jl Outdated Show resolved Hide resolved
src/blob.jl Show resolved Hide resolved
src/blob.jl Outdated Show resolved Hide resolved
Managed via compiler annotations

This new function is ~10x faster than the older `@generated` function:
- ~10ms down to ~1ms
@NHDaly NHDaly force-pushed the nhd-reduce-generated branch from 47793da to 66bab7b Compare May 9, 2024 17:09
@NHDaly NHDaly requested review from Drvi and Sacha0 May 9, 2024 21:04
@NHDaly NHDaly marked this pull request as ready for review May 9, 2024 21:04
@NHDaly
Copy link
Member Author

NHDaly commented May 9, 2024

Okay, I think this is good to review! 🎉 Thanks again for the offline support! :)

Base automatically changed from nhd-retestitems to master May 10, 2024 02:56
# ~0.5ms for 5 fields, vs ~5ms for unrolling via splatting the fields.
# ~3ms for 20 fields, vs ~6ms for splatting.
# Note that splatting gives up after ~30 fields, whereas recursion remains robust.
_sum_field_sizes(T)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like it might be a good idea to add some kind of cutoff for the recursion to fall back to runtime computations for very large types?

That's what @aviatesk did here:
https://github.com/JuliaLang/julia/pull/54026/files#diff-12e7a6522633012a408b1bdee7639e8cb722617fe1a8ed6a3881bf4ad1ebdbbdR1369-R1370

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you test, large types? I would fix it once we hit a problem, so code is not too complicated

# ~0.5ms for 5 fields, vs ~5ms for unrolling via splatting the fields.
# ~3ms for 20 fields, vs ~6ms for splatting.
# Note that splatting gives up after ~30 fields, whereas recursion remains robust.
_sum_field_sizes(T)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you test, large types? I would fix it once we hit a problem, so code is not too complicated

src/blob.jl Outdated Show resolved Hide resolved
Blob{$(fieldtype(T, i))}(blob + $(blob_offset(T, i)))
end
@assert i !== nothing "$T has no field $field"
Blob{fieldtype(T, i)}(blob + (blob_offset(T, i)))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The + creates a Blob{T} that we then cast to Blob{fieldtype(T, i)}. Wouldn't it be better to create the right type from the beginning? (I think the +/- operators don't make much sense)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds reasonable to me. Again, i just did a blind transformation on what was here... 🤔

I think the + operators are adding bytes, in which case you could do it either way? But yes i agree this is confusing

@NHDaly NHDaly requested a review from robertbuessow October 9, 2024 21:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants