-
Notifications
You must be signed in to change notification settings - Fork 4.1k
Add NEON support and move fvec_madd for NEON #4558
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add NEON support and move fvec_madd for NEON #4558
Conversation
|
This pull request was exported from Phabricator. Differential Revision: D74275821 |
|
This pull request was exported from Phabricator. Differential Revision: D74275821 |
Summary: Pull Request resolved: facebookresearch#4558 Main changes -- 1. simdlib_neon.h has been updated to use the templates properly. This is the largest file change in the diff. 2. simd_levels.h now has 2 separate macros for DISPATCH_SIMDLevel and DISPATCH_SIMDLevel_AND_RETURN. The first one just executes the given function. The second one will return a value. 3. xplat.bzl now reads the architecture and decides the COMPILE_SIMD_<> flags based on that. After we move from static to dynamic dispatch, this will be enabled. 4. Adds distances_neon.cpp and distances_sve.cpp just for fvec_madd. However, SVE code is not invoked yet. The import for detection does not quite work. It can be separate work. 5. updates RQ and PQ to accomodate simdlib_neon.h changes. Following https://docs.google.com/document/d/1hVwoW-SrPxbBxJ3Nvl58tOaKKgkzDgVxPJbx92TSTvs/edit?tab=t.0#bookmark=id.ujcwvusaoqp8, we move some functions to headers and `extern template` some things. Note 1 -- Please yell at me if we want to break the above up a bit and I am very happy to. Items 1 and 2 could be in a separate diff IMO. They are here because we needed it for dynamic dispatch, then we decided on static dispatch. Note 2 -- These two tests still fail to build. I didn't fix them yet. We can take it up later in the stack, as per the google doc? 1. test_code_distance.cpp (distance_single_code_generic no longer exists) 2. test_simdlib.cpp (needs templatization) Note 3 -- autodeps failing due to `<asm/hwcap.h>` in the SVE check. SVE does not quite work yet and it isn't enabled, so we can come back to this (or I can comment it out). Feel free to comment if you have a preference. Reviewed By: mdouze Differential Revision: D74275821
db4b558 to
4abe037
Compare
|
This pull request was exported from Phabricator. Differential Revision: D74275821 |
Summary: Pull Request resolved: facebookresearch#4558 Main changes -- 1. simdlib_neon.h has been updated to use the templates properly. This is the largest file change in the diff. 2. simd_levels.h now has 2 separate macros for DISPATCH_SIMDLevel and DISPATCH_SIMDLevel_AND_RETURN. The first one just executes the given function. The second one will return a value. 3. xplat.bzl now reads the architecture and decides the COMPILE_SIMD_<> flags based on that. After we move from static to dynamic dispatch, this will be enabled. 4. Adds distances_neon.cpp and distances_sve.cpp just for fvec_madd. However, SVE code is not invoked yet. The import for detection does not quite work. It can be separate work. 5. updates RQ and PQ to accomodate simdlib_neon.h changes. Following https://docs.google.com/document/d/1hVwoW-SrPxbBxJ3Nvl58tOaKKgkzDgVxPJbx92TSTvs/edit?tab=t.0#bookmark=id.ujcwvusaoqp8, we move some functions to headers and `extern template` some things. Note 1 -- Please yell at me if we want to break the above up a bit and I am very happy to. Items 1 and 2 could be in a separate diff IMO. They are here because we needed it for dynamic dispatch, then we decided on static dispatch. Note 2 -- These two tests still fail to build. I didn't fix them yet. We can take it up later in the stack, as per the google doc? 1. test_code_distance.cpp (distance_single_code_generic no longer exists) 2. test_simdlib.cpp (needs templatization) Note 3 -- autodeps failing due to `<asm/hwcap.h>` in the SVE check. SVE does not quite work yet and it isn't enabled, so we can come back to this (or I can comment it out). Feel free to comment if you have a preference. Reviewed By: mdouze Differential Revision: D74275821
4abe037 to
951d8e4
Compare
|
This pull request was exported from Phabricator. Differential Revision: D74275821 |
Summary: Pull Request resolved: facebookresearch#4558 Main changes -- 1. simdlib_neon.h has been updated to use the templates properly. This is the largest file change in the diff. 2. simd_levels.h now has 2 separate macros for DISPATCH_SIMDLevel and DISPATCH_SIMDLevel_AND_RETURN. The first one just executes the given function. The second one will return a value. 3. xplat.bzl now reads the architecture and decides the COMPILE_SIMD_<> flags based on that. After we move from static to dynamic dispatch, this will be enabled. 4. Adds distances_neon.cpp and distances_sve.cpp just for fvec_madd. However, SVE code is not invoked yet. The import for detection does not quite work. It can be separate work. 5. updates RQ and PQ to accomodate simdlib_neon.h changes. Following https://docs.google.com/document/d/1hVwoW-SrPxbBxJ3Nvl58tOaKKgkzDgVxPJbx92TSTvs/edit?tab=t.0#bookmark=id.ujcwvusaoqp8, we move some functions to headers and `extern template` some things. Note 1 -- Please yell at me if we want to break the above up a bit and I am very happy to. Items 1 and 2 could be in a separate diff IMO. They are here because we needed it for dynamic dispatch, then we decided on static dispatch. Note 2 -- These two tests still fail to build. I didn't fix them yet. We can take it up later in the stack, as per the google doc? 1. test_code_distance.cpp (distance_single_code_generic no longer exists) 2. test_simdlib.cpp (needs templatization) Note 3 -- autodeps failing due to `<asm/hwcap.h>` in the SVE check. SVE does not quite work yet and it isn't enabled, so we can come back to this (or I can comment it out). Feel free to comment if you have a preference. Reviewed By: mdouze Differential Revision: D74275821
951d8e4 to
fd720f0
Compare
|
This pull request was exported from Phabricator. Differential Revision: D74275821 |
Summary: Pull Request resolved: facebookresearch#4558 Main changes -- 1. simdlib_neon.h has been updated to use the templates properly. This is the largest file change in the diff. 2. simd_levels.h now has 2 separate macros for DISPATCH_SIMDLevel and DISPATCH_SIMDLevel_AND_RETURN. The first one just executes the given function. The second one will return a value. 3. xplat.bzl now reads the architecture and decides the COMPILE_SIMD_<> flags based on that. After we move from static to dynamic dispatch, this will be enabled. 4. Adds distances_neon.cpp and distances_sve.cpp just for fvec_madd. However, SVE code is not invoked yet. The import for detection does not quite work. It can be separate work. 5. updates RQ and PQ to accomodate simdlib_neon.h changes. Following https://docs.google.com/document/d/1hVwoW-SrPxbBxJ3Nvl58tOaKKgkzDgVxPJbx92TSTvs/edit?tab=t.0#bookmark=id.ujcwvusaoqp8, we move some functions to headers and `extern template` some things. Note 1 -- Please yell at me if we want to break the above up a bit and I am very happy to. Items 1 and 2 could be in a separate diff IMO. They are here because we needed it for dynamic dispatch, then we decided on static dispatch. Note 2 -- These two tests still fail to build. I didn't fix them yet. We can take it up later in the stack, as per the google doc? 1. test_code_distance.cpp (distance_single_code_generic no longer exists) 2. test_simdlib.cpp (needs templatization) Note 3 -- autodeps failing due to `<asm/hwcap.h>` in the SVE check. SVE does not quite work yet and it isn't enabled, so we can come back to this (or I can comment it out). Feel free to comment if you have a preference. Reviewed By: mdouze Differential Revision: D74275821
fd720f0 to
28666a0
Compare
…DQ, Vl, DL) detection
Summary:
* Added support to detect SIMD instruction set for both `AVX2` and `AVX512F, AVX512VL` related levels
* Added hardware specific unit tests (eg: checks when unit tests are ran on x86 arch then relevant SIMD levels are returned, also respective instructions are executed)
* Reason for explicitly running computation and not relying on `__builtin_cpu_supports("avx512f")` [link](https://stackoverflow.com/questions/48677575/does-gccs-builtin-cpu-supports-check-for-os-support)
* Also, fixes the bug in existing `AVX2` detection
* Incorrect CPUID Bit Check: Function uses `ebx & (1 << 16)` to check for `AVX2` support. This is incorrect because bit 16 in `ebx` is actually used for `AVX-512F`, not `AVX2`.
* Correct Bit for AVX2: Correct bit for detecting AVX2 is bit 5 in `ebx` when `eax = 7` and `ecx = 0`. This is based on Intel's documentation for the CPUID instruction.
* Another bug observed in constructor for SIMDConfig (if env variable is set, the codepath still follows detection via code)
* Improving SIMDConfig to take parameters to its constructor to support and enable injection mechanism for better testing* Adding more unit tests for other Hardware
* Added variable with SIMDConfig to track all possible supported SIMD Levels
Differential Revision: D72937710
Reviewed By: mdouze
Summary: `fvec_madd` is the first function to test dispatching to AVX and AVX512 distances_simd.cpp is split into specialized files distances_avx2.cpp distances_avx512.cpp that are compiled with appropriate flags. Differential Revision: D72937708 Reviewed By: mnorris11
Summary: Pull Request resolved: facebookresearch#4291 moved IndexIVFPQ and IndexPQ to dynamic dispatch. Since the code was already quite modular (thanks Alex!), this boils down to make independent cpp files for the different SIMD versions. Differential Revision: D72937709
…training code, split quantizer code into headers, Make headers more independent Summary: Move the interface of SIMD functions to use the simdXfloat32 API to mutualize code. Begin splitting the ScalarQuantizer.cpp Continue splitting. Purely in header files for now. Differential Revision: D72945865
) Summary: Pull Request resolved: facebookresearch#4296 Splits the ScalarQuantizer code into parts so that the AVX2 and AVX512 can be compiled independently. Differential Revision: D73037185
Summary: Migration of the 4-bit codecs to dynamic dispatch. The migration consists in: - templatizing the SIMD ResultHandlers to the SIMDLevel - instantiating the AVX2 and AVX512 code in their own files (compile units) - removing any SIMD dependency from IndexFastScan and IndexIVFFastScan - adding dispatching code for the SIMD code Differential Revision: D73581633
Summary: Pull Request resolved: facebookresearch#4558 Main changes -- 1. simdlib_neon.h has been updated to use the templates properly. This is the largest file change in the diff. 2. simd_levels.h now has 2 separate macros for DISPATCH_SIMDLevel and DISPATCH_SIMDLevel_AND_RETURN. The first one just executes the given function. The second one will return a value. 3. xplat.bzl now reads the architecture and decides the COMPILE_SIMD_<> flags based on that. After we move from static to dynamic dispatch, this will be enabled. 4. Adds distances_neon.cpp and distances_sve.cpp just for fvec_madd. However, SVE code is not invoked yet. The import for detection does not quite work. It can be separate work. 5. updates RQ and PQ to accomodate simdlib_neon.h changes. Following https://docs.google.com/document/d/1hVwoW-SrPxbBxJ3Nvl58tOaKKgkzDgVxPJbx92TSTvs/edit?tab=t.0#bookmark=id.ujcwvusaoqp8, we move some functions to headers and `extern template` some things. Note 1 -- Please yell at me if we want to break the above up a bit and I am very happy to. Items 1 and 2 could be in a separate diff IMO. They are here because we needed it for dynamic dispatch, then we decided on static dispatch. Note 2 -- These two tests still fail to build. I didn't fix them yet. We can take it up later in the stack, as per the google doc? 1. test_code_distance.cpp (distance_single_code_generic no longer exists) 2. test_simdlib.cpp (needs templatization) Note 3 -- autodeps failing due to `<asm/hwcap.h>` in the SVE check. SVE does not quite work yet and it isn't enabled, so we can come back to this (or I can comment it out). Feel free to comment if you have a preference. Reviewed By: mdouze Differential Revision: D74275821
|
This pull request was exported from Phabricator. Differential Revision: D74275821 |
28666a0 to
c3ec02f
Compare
Summary:
Main changes
simdlib_neon.h has been updated to use the templates properly. This is the largest file change in the diff.
simd_levels.h now has 2 separate macros for DISPATCH_SIMDLevel and DISPATCH_SIMDLevel_AND_RETURN. The first one just executes the given function. The second one will return a value.
xplat.bzl now reads the architecture and decides the COMPILE_SIMD_<> flags based on that. After we move from static to dynamic dispatch, this will be enabled.
Adds distances_neon.cpp and distances_sve.cpp just for fvec_madd. However, SVE code is not invoked yet. The import for detection does not quite work. It can be separate work.
updates RQ and PQ to accomodate simdlib_neon.h changes. Following https://docs.google.com/document/d/1hVwoW-SrPxbBxJ3Nvl58tOaKKgkzDgVxPJbx92TSTvs/edit?tab=t.0#bookmark=id.ujcwvusaoqp8, we move some functions to headers and
extern templatesome things.Note 1
Please yell at me if we want to break the above up a bit and I am very happy to. Items 1 and 2 could be in a separate diff IMO. They are here because we needed it for dynamic dispatch, then we decided on static dispatch.
Note 2
These two tests still fail to build. I didn't fix them yet. We can take it up later in the stack, as per the google doc?
Note 3
autodeps failing due to
<asm/hwcap.h>in the SVE check. SVE does not quite work yet and it isn't enabled, so we can come back to this (or I can comment it out). Feel free to comment if you have a preference.Reviewed By: mdouze
Differential Revision: D74275821