-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Use NarrowNative for narrowing operations
#116539
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Wonder if that extra AND is a big deal.. Perhaps, it should be called |
|
Tagging subscribers to this area: @dotnet/area-system-text-encoding |
👍 I don't expect this to have any significant impact and we'd overall prefer less private/internal helpers where possible. I think we should just keep it as is and live with the single extra and instruction; likely moving other APIs to do the same over time.
I don't think this is one that would be good for an unsafe/estimate API as we typically only use such APIs for handling things that live as undefined behavior. Narrowing basically comes down to only having well-defined behavior (truncation or saturation). I don't think we'd necessarily want one that does one or the other based on whatever is "fastest". For many of the cases using |
ExtractAsciiVector for narrowing operationsNarrowNative for narrowing operations
jeffhandley
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tannergooding This looks good to me now. Can you re-review please?
|
@xtqqczze do you have any perf numbers? The above diff report is showing no real change to the codegen (before and after are the same, just with after having an additional API now for This adds some complexity to the codebase that doesn't seem desirable long term. |
This is expected as the changes just create a unified implementation rather then having logic in different places doing the same thing. I think this could be marked no-merge until #118583 is completed as the changes could then be considerably simplified. |
| Vector128<ushort> vec2 = Vector128.LoadUnsafe(ref srcRef, offset + (nuint)Vector128<ushort>.Count).AsUInt16(); | ||
|
|
||
| vec = Ascii.ExtractAsciiVector(vec1, vec2); | ||
| vec = Vector128.NarrowNative(vec1, vec2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you get numbers if you just use Narrow instead of creating a new NarrowNative?
Diff between ; Emitting BLENDED_CODE for generic X64 + VEX on Unix
+ vmovaps xmm0, xmmword ptr [rsp+0x08]
+ vpackuswb xmm0, xmm0, xmmword ptr [rsp+0x18]
- vbroadcastss xmm0, dword ptr [reloc @RWD00]
- vpand xmm1, xmm0, xmmword ptr [rsp+0x08]
- vpand xmm0, xmm0, xmmword ptr [rsp+0x18]
- vpackuswb xmm0, xmm1, xmm0
vmovups xmmword ptr [rdi], xmm0
mov rax, rdi
+ ;; size=19 bbWeight=1 PerfScore 7.25
- ;; size=32 bbWeight=1 PerfScore 10.25 |
|
The question is more whether or not the potential perf difference there matters outside of micro-benchmarks We're speculating that it's unlikely, particularly given newer hardware that has non saturating narrowing operations. |
|
; Emitting BLENDED_CODE for generic X64 + VEX + EVEX on Unix
+ vmovaps xmm0, xmmword ptr [rsp+0x08]
+ vpackuswb xmm0, xmm0, xmmword ptr [rsp+0x18]
- vmovaps xmm0, xmmword ptr [rsp+0x08]
- vpmovwb xmm0, xmm0
- vmovaps xmm1, xmmword ptr [rsp+0x18]
- vpmovwb xmm1, xmm1
- vmovlhps xmm0, xmm0, xmm1
vmovups xmmword ptr [rdi], xmm0
mov rax, rdi
+ ;; size=19 bbWeight=1 PerfScore 7.25
- ;; size=35 bbWeight=1 PerfScore 13.25 |
|
That looks like a codegen issue. The two vectors are neighbors so it should be generating But that's also a bit beside the point. "Looks worse" doesn't actually mean "performs worse". What matters here is the real world perf for typical scenarios (not microbenchmarks). Without data showing otherwise, the presumption is that the slightly worse codegen doesn't matter and can be improved over time where it isn't doing the "right thing". |
|
I'm going to say this is one we don't want to take. We'd prefer to just use Thanks for the PR and locating the other codegen issue that we do want to resolve |
This PR introduces a new helper method,
NarrowNative, to handle narrowing operations more consistently. It replaces prior logic (previously usingExtractAsciiVector) with this unified implementation.Once #118583 is completed, the conditional compilation blocks can be removed, using the unified implementation for all targets.
https://csharp.godbolt.org/z/dc1zn9jMb