Skip to content

Conversation

@mulugetam
Copy link
Contributor

This pull request supersedes #4625. The previous PR was closed to reorganize commits and make the history cleaner.

--
This update improves the performance of partitioning by leveraging AVX-512 VBMI2 instructions. The optimization requires building FAISS with -DFAISS_OPT_LEVEL=avx512_spr. Running benchs/bench_partition.py shows a significant performance gain over the existing implementation.

benchs/bench_partition.py on AWS c7i.4xlarge instance.

-DFAISS_OPT_LEVEL=avx512
--
n=200 qin=(100, 100) maxval=65536 id_type=int64  	times 3.602 µs (± 1.5110 µs) nerr=0 bissect 0.000 Mcy compress 0.000 Mcy
n=200 qin=(100, 150) maxval=65536 id_type=int64  	times 2.971 µs (± 0.6289 µs) nerr=0 bissect 0.000 Mcy compress 0.000 Mcy
n=2000 qin=(1000, 1000) maxval=65536 id_type=int64  	times 5.658 µs (± 0.7498 µs) nerr=0 bissect 0.000 Mcy compress 0.000 Mcy
n=2000 qin=(1000, 1500) maxval=65536 id_type=int64  	times 4.878 µs (± 0.7968 µs) nerr=0 bissect 0.000 Mcy compress 0.000 Mcy
n=20000 qin=(10000, 10000) maxval=65536 id_type=int64  	times 38.313 µs (± 2.5256 µs) nerr=0 bissect 0.000 Mcy compress 0.000 Mcy
n=20000 qin=(10000, 15000) maxval=65536 id_type=int64  	times 38.671 µs (± 3.9962 µs) nerr=0 bissect 0.000 Mcy compress 0.000 Mcy
n=200 qin=(100, 100) maxval=65536 id_type=int32  	times 3.112 µs (± 0.5962 µs) nerr=0 bissect 0.000 Mcy compress 0.000 Mcy
n=200 qin=(100, 150) maxval=65536 id_type=int32  	times 3.004 µs (± 0.5767 µs) nerr=0 bissect 0.000 Mcy compress 0.000 Mcy
n=2000 qin=(1000, 1000) maxval=65536 id_type=int32  	times 5.701 µs (± 0.7734 µs) nerr=0 bissect 0.000 Mcy compress 0.000 Mcy
n=2000 qin=(1000, 1500) maxval=65536 id_type=int32  	times 4.783 µs (± 0.7614 µs) nerr=0 bissect 0.000 Mcy compress 0.000 Mcy
n=20000 qin=(10000, 10000) maxval=65536 id_type=int32  	times 39.210 µs (± 2.9367 µs) nerr=0 bissect 0.000 Mcy compress 0.000 Mcy
n=20000 qin=(10000, 15000) maxval=65536 id_type=int32  	times 42.442 µs (± 2.8150 µs) nerr=0 bissect 0.000 Mcy compress 0.000 Mcy

-DFAISS_OPT_LEVEL=avx512_spr
--
n=200 qin=(100, 100) maxval=65536 id_type=int64  	times 3.041 µs (± 1.5132 µs) nerr=0 bissect 0.000 Mcy compress 0.000 Mcy
n=200 qin=(100, 150) maxval=65536 id_type=int64  	times 2.812 µs (± 0.5979 µs) nerr=0 bissect 0.000 Mcy compress 0.000 Mcy
n=2000 qin=(1000, 1000) maxval=65536 id_type=int64  	times 3.640 µs (± 0.6283 µs) nerr=0 bissect 0.000 Mcy compress 0.000 Mcy
n=2000 qin=(1000, 1500) maxval=65536 id_type=int64  	times 3.242 µs (± 0.6140 µs) nerr=0 bissect 0.000 Mcy compress 0.000 Mcy
n=20000 qin=(10000, 10000) maxval=65536 id_type=int64  	times 13.298 µs (± 1.3659 µs) nerr=0 bissect 0.000 Mcy compress 0.000 Mcy
n=20000 qin=(10000, 15000) maxval=65536 id_type=int64  	times 9.290 µs (± 1.1589 µs) nerr=0 bissect 0.000 Mcy compress 0.000 Mcy
n=200 qin=(100, 100) maxval=65536 id_type=int32  	times 2.923 µs (± 0.6588 µs) nerr=0 bissect 0.000 Mcy compress 0.000 Mcy
n=200 qin=(100, 150) maxval=65536 id_type=int32  	times 2.807 µs (± 0.6188 µs) nerr=0 bissect 0.000 Mcy compress 0.000 Mcy
n=2000 qin=(1000, 1000) maxval=65536 id_type=int32  	times 3.500 µs (± 0.5938 µs) nerr=0 bissect 0.000 Mcy compress 0.000 Mcy
n=2000 qin=(1000, 1500) maxval=65536 id_type=int32  	times 3.092 µs (± 0.5829 µs) nerr=0 bissect 0.000 Mcy compress 0.000 Mcy
n=20000 qin=(10000, 10000) maxval=65536 id_type=int32  	times 12.013 µs (± 1.1529 µs) nerr=0 bissect 0.000 Mcy compress 0.000 Mcy
n=20000 qin=(10000, 15000) maxval=65536 id_type=int32  	times 7.122 µs (± 0.9574 µs) nerr=0 bissect 0.000 Mcy compress 0.000 Mcy

@alexanderguzhva
Copy link
Contributor

@mulugetam please ensure that the build succeeds

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants