Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(vec & cns) == zero is generating an extra vpand #111824

Open
PranavSenthilnathan opened this issue Jan 25, 2025 · 7 comments
Open

(vec & cns) == zero is generating an extra vpand #111824

PranavSenthilnathan opened this issue Jan 25, 2025 · 7 comments
Assignees
Labels
arch-x64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI tenet-performance Performance related issue untriaged New issue has not been triaged by the area owner

Comments

@PranavSenthilnathan
Copy link
Member

Regression: #110413
Repro: EgorBot/runtime-utils#260

The change around here has regressed code gen for AVX2 CPUs without AVX-512: https://github.com/dotnet/runtime/pull/104488/files#diff-6b4906abc01dc4699f348f7c1df72e2f640f240aa31ea67cd47642221b2021f5L1560-L1571

Previous code gen (in GetIndexOfFirstNonAsciiChar_Vector):

G_M000_IG03:                ;; offset=0x002D
       vptest   ymm0, ymmword ptr [rdi]
       jne      SHORT G_M000_IG04
       add      rdi, 32
       cmp      rdi, rcx
       jbe      SHORT G_M000_IG03

Current code gen:

G_M000_IG03:                ;; offset=0x0030
       vpand    ymm1, ymm0, ymmword ptr [rdi]
       vptest   ymm1, ymm1
       jne      SHORT G_M000_IG04
       add      rdi, 32
       cmp      rdi, rcx
; ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (cmp: 2 ; jcc erratum) 32B boundary ...............................
       jbe      SHORT G_M000_IG03
@PranavSenthilnathan PranavSenthilnathan self-assigned this Jan 25, 2025
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Jan 25, 2025
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-runtime-intrinsics
See info in area-owners.md if you want to be subscribed.

@PranavSenthilnathan
Copy link
Member Author

@tannergooding I assigned myself to fix this in my free time in the next few days but feel free to reassign if this needs to be fixed more urgently.

@EgorBo
Copy link
Member

EgorBo commented Jan 25, 2025

@tannergooding I assigned myself to fix this in my free time in the next few days but feel free to reassign if this needs to be fixed more urgently.

Just in case - the issue has to be fixed in JIT rather than C# impl

@EgorBo
Copy link
Member

EgorBo commented Jan 25, 2025

internal static bool VectorContainsNonAsciiChar(Vector256<ushort> utf16Vector)
{
    const ushort asciiMask = ushort.MaxValue - 127; // 0xFF80
    Vector256<ushort> zeroIsAscii = utf16Vector & Vector256.Create(asciiMask);
    return zeroIsAscii != Vector256<ushort>.Zero;
}

AVX512=1:

; Method Benchmarks:VectorContainsNonAsciiChar(System.Runtime.Intrinsics.Vector256`1[ushort]):ubyte (FullOpts)
       vmovups  ymm0, ymmword ptr [rcx]
       vpandd   ymm0, ymm0, dword ptr [reloc @RWD00] {1to8}
       vptest   ymm0, ymm0
       setne    al
       movzx    rax, al
       vzeroupper 
       ret      
RWD00  	dd	FF80FF80h

AVX512=0:

; Method Benchmarks:VectorContainsNonAsciiChar(System.Runtime.Intrinsics.Vector256`1[ushort]):ubyte (FullOpts)
       vmovups  ymm0, ymmword ptr [rcx]
       vptest   ymm0, ymmword ptr [reloc @RWD00]
       setne    al
       movzx    rax, al
       vzeroupper 
       ret      
RWD00  	dq	FF80FF80FF80FF80h, FF80FF80FF80FF80h, FF80FF80FF80FF80h, FF80FF80FF80FF80h
; Total bytes of code: 23

@EgorBo
Copy link
Member

EgorBo commented Jan 25, 2025

I guess the optimization that transforms V1 & VEC_CNS into PTEST gets confused by AVX512's BroadcastScalarToVector256 optimization

@EgorBo EgorBo added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI and removed area-System.Runtime.Intrinsics labels Jan 25, 2025
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@PranavSenthilnathan
Copy link
Member Author

@EgorBot -intel -profiler -commit main --envvars DOTNET_JitDisasm:Vector DOTNET_EnableAVX512F:0

using System.Text;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

// Actual runner is optional, but if it exists, it has to be like this:
BenchmarkSwitcher.FromAssembly(typeof(Bench).Assembly).Run(args);

public class Bench
{
    public int size = 512;

    public string encName = "utf-8";
    private Encoding _enc;
    private string _toEncode;
    private char[] _chars;
    [GlobalSetup]
    public void SetupGetBytes()
    {
        _enc = Encoding.GetEncoding(encName);
        _toEncode = CreateString(size);
        _chars = _toEncode.ToCharArray();
    }

    [Benchmark]
    [MemoryRandomization]
    public int GetByteCount() => _enc.GetByteCount(_chars);

    public static string CreateString(int length)
    {
        char[] str = new char[length];

        for (int i = 0; i < str.Length; i++)
        {
            // Add path separator so folders aren't too long.
            if (i % 20 == 0)
            {
                str[i] = Path.DirectorySeparatorChar;
            }
            else
            {
                str[i] = 'a';
            }
        }

        return new string(str);
    }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch-x64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI tenet-performance Performance related issue untriaged New issue has not been triaged by the area owner
Projects
None yet
Development

No branches or pull requests

2 participants