-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce generated functions: getindex #28
base: main
Are you sure you want to change the base?
Conversation
- The produced code is unchanged, and the perf remains the same. - This is tested by a new testitem
functions, but these had code-size and perf impacts. :(
Managed via compiler annotations This new function is ~10x faster than the older `@generated` function: - ~10ms down to ~1ms
47793da
to
66bab7b
Compare
Okay, I think this is good to review! 🎉 Thanks again for the offline support! :) |
# ~0.5ms for 5 fields, vs ~5ms for unrolling via splatting the fields. | ||
# ~3ms for 20 fields, vs ~6ms for splatting. | ||
# Note that splatting gives up after ~30 fields, whereas recursion remains robust. | ||
_sum_field_sizes(T) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like it might be a good idea to add some kind of cutoff for the recursion to fall back to runtime computations for very large types?
That's what @aviatesk did here:
https://github.com/JuliaLang/julia/pull/54026/files#diff-12e7a6522633012a408b1bdee7639e8cb722617fe1a8ed6a3881bf4ad1ebdbbdR1369-R1370
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you test, large types? I would fix it once we hit a problem, so code is not too complicated
2x slower, but less compile time so worth it.
# ~0.5ms for 5 fields, vs ~5ms for unrolling via splatting the fields. | ||
# ~3ms for 20 fields, vs ~6ms for splatting. | ||
# Note that splatting gives up after ~30 fields, whereas recursion remains robust. | ||
_sum_field_sizes(T) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you test, large types? I would fix it once we hit a problem, so code is not too complicated
Blob{$(fieldtype(T, i))}(blob + $(blob_offset(T, i))) | ||
end | ||
@assert i !== nothing "$T has no field $field" | ||
Blob{fieldtype(T, i)}(blob + (blob_offset(T, i))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The +
creates a Blob{T}
that we then cast to Blob{fieldtype(T, i)}
. Wouldn't it be better to create the right type from the beginning? (I think the +/- operators don't make much sense)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds reasonable to me. Again, i just did a blind transformation on what was here... 🤔
I think the + operators are adding bytes, in which case you could do it either way? But yes i agree this is confusing
Convert all generated functions to regular function.
Compilation time comparisons:
Runtime comparisons on julia 1.10:
Runtime comparisons on julia 1.11: