Skip to content

CPU memory spike during GPU -> CPU index conversion #4607

@rchitale7

Description

@rchitale7

Hi,

I am noticing that during conversion of a GPU index (built using CAGRA) to CPU index, the CPU memory spikes for datasets that have a high number of vectors but low number of dimensions, compared with datasets that have a lower number of vectors but higher number of dimensions. For example, for a 40 million x 128 fp32 vector dataset, the CPU memory spikes to almost ~60 GB. However, for a 4 million x 1536 fp32 vector dataset, the CPU memory stays under ~50 GB. The 40 million x 128 dataset takes up 19531.25 MiB of CPU memory, but the 4 million x 1536 dataset takes up 23438 MiB of CPU memory. I am curious if there's a reason why the CPU memory taken up during the GPU to CPU conversion for the 40 million x 128 dataset is higher, even though the size of the vector dataset is smaller. Is there a bug in the faiss code that is causing this spike, or this expected?

My Setup

I used an EC2 g6.12xlarge machine with the Deep Learning Base OSS Nvidia Driver GPU AMI (Amazon Linux 2023) 20250912 AMI: https://docs.aws.amazon.com/dlami/latest/devguide/aws-deep-learning-base-gpu-ami-amazon-linux-2023.html

output of nvidia-smi:

Mon Oct 13 03:30:57 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.172.08             Driver Version: 570.172.08     CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA L4                      On  |   00000000:38:00.0 Off |                    0 |
| N/A   36C    P8             16W /   72W |       0MiB /  23034MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA L4                      On  |   00000000:3A:00.0 Off |                    0 |
| N/A   31C    P8             16W /   72W |       0MiB /  23034MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA L4                      On  |   00000000:3C:00.0 Off |                    0 |
| N/A   31C    P8             16W /   72W |       0MiB /  23034MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA L4                      On  |   00000000:3E:00.0 Off |                    0 |
| N/A   27C    P8             11W /   72W |       0MiB /  23034MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

Reproduction instructions

  1. On a server with GPUs and conda installed, setup the following conda environment:
conda create -n faiss_test_new -c conda-forge -c pytorch -c nvidia -c rapidsai python=3.12 faiss-gpu-cuvs=1.12.0 py3nvml pandas matplotlib psutil
  1. Activate conda env:
conda activate faiss_test_new
  1. Download the test script: https://gist.github.com/rchitale7/1e2a9c417139bfeeb902ba44f3e21a76

  2. Run the test script with vector document and dimension counts. For example, for 40 mil x 128 dataset you can do:

python faiss_test.py -docs 40000000 --dims 128
  1. This will generate a csv file with the CPU memory timestamps, called cpu_metrics_cagra_docs_40000000_dims_128.csv. This will also generate a graph called memory_correlation_docs_40000000_dims_128.png that correlates the events during the index build process with the CPU memory at the relevant timestamps. In the script, I used the cpu_used_process_memory column in the csv file for the graph. The process CPU memory is measured using the psutil library.

I've attached to the graphs I generated for the 40 mil x 128 and 4 mil x 1536 datasets to this issue, when i ran the script on an EC2 g6.12xlarge machine.

Image Image

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions