-
Notifications
You must be signed in to change notification settings - Fork 4.1k
Description
Hi,
I am trying to build two GPU indexes of the same size in two parallel processes, using GpuIndexCagra and the python API for faiss. I've noticed that this takes roughly 2x the time to build a single GPU index in one process. But, if i set the GpuIndexCagraConfig.device setting to different values for each of the scripts, then I get roughly the same performance as building a single GPU index. However, I believe this device setting can only be changed on multi-GPU hardware. For hardware with a single GPU, is it expected behavior to see ~2x slowdown when building two GPU indexes in parallel? I would like to utilize the memory of my single GPU instance as much as possible, so curious if there's a way to get a performance boost when building multiple GPU indexes at the same time.
I am creating a 1,000,000 x 768 numpy array for this test. The script I used is here: https://gist.github.com/rchitale7/8e0995e8231eec5657f42627d2cc1228
My Setup
I used an EC2 g6.12xlarge machine (for multi GPU test) and g5.2xlarge (for single GPU test), with the Deep Learning Base OSS Nvidia Driver GPU AMI (Amazon Linux 2023) 20250912 AMI: https://docs.aws.amazon.com/dlami/latest/devguide/aws-deep-learning-base-gpu-ami-amazon-linux-2023.html
output of nvidia-smi:
Tue Sep 23 19:00:56 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.172.08 Driver Version: 570.172.08 CUDA Version: 12.8 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A10G On | 00000000:00:1E.0 Off | 0 |
| 0% 23C P8 15W / 300W | 0MiB / 23028MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
Reproduction instructions
- On a server with GPUs and
condainstalled, setup the followingcondaenvironment:
conda create -n faiss_test_new -c conda-forge -c pytorch -c nvidia -c rapidsai python=3.12 faiss-gpu-cuvs=1.12.0
- Activate conda env:
conda activate faiss_test_new
-
Download the test script: https://gist.github.com/rchitale7/8e0995e8231eec5657f42627d2cc1228
-
Run the test script, supplying a device id (can only use
0if there is one GPU on the instance):
python faiss_cagra_test.py 0
- In a separate terminal session, activate the conda env and launch the same test script.
conda activate faiss_test_new && python faiss_cagra_test.py 0
-
Once both test scripts reach the
pdbbreak point, at the same time, presscto continue the execution and run the scripts in parallel -
After both scripts have completed, compare the results to running only one script at a time.