CPU memory spike during GPU -> CPU index conversion

Hi, 

I am noticing that during conversion of a GPU index (built using CAGRA) to CPU index, the CPU memory spikes for datasets that have a high number of vectors but low number of dimensions, compared with datasets that have a lower number of vectors but higher number of dimensions. For example, for a 40 million x 128 fp32 vector dataset, the CPU memory spikes to almost ~60 GB. However, for a 4 million x 1536 fp32 vector dataset, the CPU memory stays under ~50 GB. The 40 million x 128 dataset takes up 19531.25 MiB of CPU memory, but the 4 million x 1536 dataset takes up 23438 MiB of CPU memory. I am curious if there's a reason why the CPU memory taken up during the GPU to CPU conversion for the 40 million x 128 dataset is higher, even though the size of the vector dataset is smaller. Is there a bug in the faiss code that is causing this spike, or this expected?

**My Setup**

I used an EC2 g6.12xlarge machine with the `Deep Learning Base OSS Nvidia Driver GPU AMI (Amazon Linux 2023) 20250912` AMI: https://docs.aws.amazon.com/dlami/latest/devguide/aws-deep-learning-base-gpu-ami-amazon-linux-2023.html

output of `nvidia-smi`:
```
Mon Oct 13 03:30:57 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.172.08             Driver Version: 570.172.08     CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA L4                      On  |   00000000:38:00.0 Off |                    0 |
| N/A   36C    P8             16W /   72W |       0MiB /  23034MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA L4                      On  |   00000000:3A:00.0 Off |                    0 |
| N/A   31C    P8             16W /   72W |       0MiB /  23034MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA L4                      On  |   00000000:3C:00.0 Off |                    0 |
| N/A   31C    P8             16W /   72W |       0MiB /  23034MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA L4                      On  |   00000000:3E:00.0 Off |                    0 |
| N/A   27C    P8             11W /   72W |       0MiB /  23034MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+
```

**Reproduction instructions**

1. On a server with GPUs and `conda` installed, setup the following `conda` environment:

```
conda create -n faiss_test_new -c conda-forge -c pytorch -c nvidia -c rapidsai python=3.12 faiss-gpu-cuvs=1.12.0 py3nvml pandas matplotlib psutil
```

2. Activate conda env:
```
conda activate faiss_test_new
```

3. Download the test script: https://gist.github.com/rchitale7/1e2a9c417139bfeeb902ba44f3e21a76

4. Run the test script with vector document and dimension counts. For example, for 40 mil x 128 dataset you can do:

```
python faiss_test.py -docs 40000000 --dims 128
```

5. This will generate a csv file with the CPU memory timestamps, called `cpu_metrics_cagra_docs_40000000_dims_128.csv`. This will also generate a graph called `memory_correlation_docs_40000000_dims_128.png` that correlates the events during the index build process with the CPU memory at the relevant timestamps. In the script, I used the `cpu_used_process_memory` column in the csv file for the graph. The process CPU memory is measured using the `psutil` library. 

I've attached to the graphs I generated for the 40 mil x 128 and 4 mil x 1536 datasets to this issue, when i ran the script on an EC2 g6.12xlarge machine. 

<img width="4472" height="2371" alt="Image" src="https://github.com/user-attachments/assets/189c7fb9-1c1c-4f35-97f7-83db6205d619" />

<img width="4472" height="2371" alt="Image" src="https://github.com/user-attachments/assets/b26a4e11-31c2-4bc0-8ce4-5756d50e76c9" />



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CPU memory spike during GPU -> CPU index conversion #4607

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

CPU memory spike during GPU -> CPU index conversion #4607

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions