Skip to content

Conversation

@xtqqczze
Copy link
Contributor

@xtqqczze xtqqczze commented Jun 11, 2025

This PR introduces a new helper method, NarrowNative, to handle narrowing operations more consistently. It replaces prior logic (previously using ExtractAsciiVector) with this unified implementation.

Once #118583 is completed, the conditional compilation blocks can be removed, using the unified implementation for all targets.

https://csharp.godbolt.org/z/dc1zn9jMb

@github-actions github-actions bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Jun 11, 2025
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Jun 11, 2025
@EgorBo
Copy link
Member

EgorBo commented Jun 11, 2025

Wonder if that extra AND is a big deal..

Perhaps, it should be called NarrowUnsafe ? and moved to Vector128 as an internal helper. cc @tannergooding

@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-text-encoding
See info in area-owners.md if you want to be subscribed.

@tarekgh tarekgh requested a review from tannergooding June 28, 2025 17:11
@tarekgh tarekgh added this to the 10.0.0 milestone Jun 28, 2025
@tannergooding
Copy link
Member

Wonder if that extra AND is a big deal..

👍

I don't expect this to have any significant impact and we'd overall prefer less private/internal helpers where possible.

I think we should just keep it as is and live with the single extra and instruction; likely moving other APIs to do the same over time.

Perhaps, it should be called NarrowUnsafe

I don't think this is one that would be good for an unsafe/estimate API as we typically only use such APIs for handling things that live as undefined behavior. Narrowing basically comes down to only having well-defined behavior (truncation or saturation). I don't think we'd necessarily want one that does one or the other based on whatever is "fastest".

For many of the cases using ExtractToAsciiVector, we could even extend the existing range support to allow implicit elision of the and (because it's doing something like a if (x >= 0x80) { return; } path already)

Copy link
Member

@jeffhandley jeffhandley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tannergooding This looks good to me now. Can you re-review please?

@tannergooding
Copy link
Member

@xtqqczze do you have any perf numbers?

The above diff report is showing no real change to the codegen (before and after are the same, just with after having an additional API now for Vector<T>)

This adds some complexity to the codebase that doesn't seem desirable long term.

@xtqqczze
Copy link
Contributor Author

The above diff report is showing no real change to the codegen (before and after are the same, just with after having an additional API now for Vector<T>)

This is expected as the changes just create a unified implementation rather then having logic in different places doing the same thing.

I think this could be marked no-merge until #118583 is completed as the changes could then be considerably simplified.

Vector128<ushort> vec2 = Vector128.LoadUnsafe(ref srcRef, offset + (nuint)Vector128<ushort>.Count).AsUInt16();

vec = Ascii.ExtractAsciiVector(vec1, vec2);
vec = Vector128.NarrowNative(vec1, vec2);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you get numbers if you just use Narrow instead of creating a new NarrowNative?

@xtqqczze
Copy link
Contributor Author

xtqqczze commented Oct 24, 2025

Wonder if that extra AND is a big deal..

Diff between Vector128.Narrow and Vector128.NarrowNative:

; Emitting BLENDED_CODE for generic X64 + VEX on Unix
+        vmovaps  xmm0, xmmword ptr [rsp+0x08]
+        vpackuswb xmm0, xmm0, xmmword ptr [rsp+0x18]
-        vbroadcastss xmm0, dword ptr [reloc @RWD00]
-        vpand    xmm1, xmm0, xmmword ptr [rsp+0x08]
-        vpand    xmm0, xmm0, xmmword ptr [rsp+0x18]
-        vpackuswb xmm0, xmm1, xmm0
         vmovups  xmmword ptr [rdi], xmm0
         mov      rax, rdi
+ 						;; size=19 bbWeight=1 PerfScore 7.25
- 						;; size=32 bbWeight=1 PerfScore 10.25

@tannergooding
Copy link
Member

The question is more whether or not the potential perf difference there matters outside of micro-benchmarks

We're speculating that it's unlikely, particularly given newer hardware that has non saturating narrowing operations.

@xtqqczze
Copy link
Contributor Author

Vector128.Narrow codegen looks worse on x86-64-v4:

; Emitting BLENDED_CODE for generic X64 + VEX + EVEX on Unix
+       vmovaps  xmm0, xmmword ptr [rsp+0x08]
+       vpackuswb xmm0, xmm0, xmmword ptr [rsp+0x18]
-       vmovaps  xmm0, xmmword ptr [rsp+0x08]
-       vpmovwb  xmm0, xmm0
-       vmovaps  xmm1, xmmword ptr [rsp+0x18]
-       vpmovwb  xmm1, xmm1
-       vmovlhps xmm0, xmm0, xmm1
        vmovups  xmmword ptr [rdi], xmm0
        mov      rax, rdi
+						;; size=19 bbWeight=1 PerfScore 7.25
-						;; size=35 bbWeight=1 PerfScore 13.25

@tannergooding
Copy link
Member

That looks like a codegen issue. The two vectors are neighbors so it should be generating vmovups ymm0, ymmwork ptr [rsp+0x08]; vpmovmwb xmm0, ymm0 or similar

But that's also a bit beside the point. "Looks worse" doesn't actually mean "performs worse". What matters here is the real world perf for typical scenarios (not microbenchmarks). Without data showing otherwise, the presumption is that the slightly worse codegen doesn't matter and can be improved over time where it isn't doing the "right thing".

@tannergooding
Copy link
Member

tannergooding commented Nov 6, 2025

I'm going to say this is one we don't want to take.

We'd prefer to just use Narrow and then fix the codegen issue that was listed.

Thanks for the PR and locating the other codegen issue that we do want to resolve

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-System.Text.Encoding community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants