Skip to content

GpuIndexCagra fails search if the source IndexHNSWCagra is destroyed after copyFrom #4742

@mageirakos

Description

@mageirakos

My Goal:

Happy New Year!

I want to be able to persist CAGRA GPU indexes.

The standard Faiss workflow for this as I understand it, and please correct me if I am wrong, is:

1 GpuIndexCagra (GPU) $\rightarrow$ copyTo $\rightarrow$ IndexHNSWCagra (CPU).
2. write_index (CPU Index) to disk.
3. read_index (CPU Index) from disk.
4. IndexHNSWCagra(CPU) $\rightarrow$ copyFrom or index_cpu_to_gpu $\rightarrow$ GpuIndexCagra (GPU).

Problem Description:

In DGX-Spark :

When cloning a CPU IndexHNSWCagra to a GPU CAGRA index using copyFrom/index_cpu_to_gpu, the resulting GPU index appears to maintain a shallow dependency on the host memory of the source index. If the source CPU index is destroyed, subsequent searches on the GPU index fail with a CUDA illegal memory access. I also get invalid device ordinal but you can see on the output that it should be 0. I tried to follow this pattern from index_cpu_to_gpu internally : GpuCloner.cpp#L236-L244

Interesting:

  • In a different H100 machine: The same script works, and I do not get the above problem. Which might hint to either a unified memory issue, or some mis-configuration?
  • Also if you reduce nb, so that we're using less memory in the DGX-Spark the crash does not happen.

Setup

Tested on both machines in a container with library versions:

-- Faiss version:  1.13.0
-- cuVS version:   25.08.0
-- RAFT version:   25.08.0
-- CUDA version:   12.8.93

Minimal script for reproducibility:

#include <cuda_runtime.h>
#include <faiss/gpu/GpuIndexCagra.h>
#include <faiss/gpu/StandardGpuResources.h>
#include <faiss/IndexHNSW.h>
#include <iostream>
#include <memory>
#include <vector>

void print_available_devices() {
    int deviceCount = 0;
    if (cudaGetDeviceCount(&deviceCount) != cudaSuccess || deviceCount == 0) {
        std::cerr << "No CUDA devices found." << std::endl;
        return;
    }
    std::cout << "--- Available CUDA Devices ---" << std::endl;
    for (int i = 0; i < deviceCount; ++i) {
        cudaDeviceProp prop;
        cudaGetDeviceProperties(&prop, i);
        std::cout << "Ordinal [" << i << "]: " << prop.name << " (" << prop.totalGlobalMem / 1024 / 1024 << " MB)" << std::endl;
    }
}

void run_search(faiss::gpu::GpuIndexCagra* index, int d, const std::string& label) {
    std::vector<float> query(d, 0.5f);
    std::vector<float> dists(1);
    faiss::idx_t labels[1];
    std::cout << "[ SEARCH ] " << label << "..." << std::endl;
    index->search(1, query.data(), 1, dists.data(), labels);
    std::cout << "[ SUCCESS ] Result ID: " << labels[0] << std::endl;
}

int main() {
    print_available_devices();

    const int d = 128;
    const int nb = 100000;
    faiss::gpu::StandardGpuResources res;
    faiss::gpu::GpuIndexCagraConfig config;
    config.device = 0;

    // 1. Setup Data
    std::vector<float> data(nb * d);
    for (auto& val : data) val = (float)rand() / RAND_MAX;

    // 2. Build Initial GPU Index
    std::cout << "\n--- STEP 1: Building Initial Index ---" << std::endl;
    auto gpu_index = std::make_unique<faiss::gpu::GpuIndexCagra>(&res, d, faiss::METRIC_L2, config);
    gpu_index->train(nb, data.data());

    // 3. Move it to CPU (The "Source" of the copy)
    std::cout << "--- STEP 2: Moving to CPU Source ---" << std::endl;
    auto cpu_source = std::make_unique<faiss::IndexHNSWCagra>(d, faiss::METRIC_L2);
    gpu_index->copyTo(cpu_source.get());

    // 4. Populate a NEW GPU Index from that CPU Source
    std::cout << "--- STEP 3: Populating New GPU Index from CPU ---" << std::endl;
    auto target_gpu = std::make_unique<faiss::gpu::GpuIndexCagra>(&res, d, faiss::METRIC_L2, config);
    target_gpu->copyFrom(cpu_source.get());

    // 5. Verification: Search before and after destroying the CPU source
    run_search(target_gpu.get(), d, "Search (CPU Source ALIVE)");

    std::cout << "--- STEP 4: Destroying CPU Source ---" << std::endl;
    cpu_source.reset(); // This tests if 'target_gpu' has a shallow or deep copy

    try {
        run_search(target_gpu.get(), d, "Search (CPU Source DELETED)");
        std::cout << "\n[ RESULT ] SURVIVED: Target GPU index is autonomous." << std::endl;
    } catch (const std::exception& e) {
        std::cerr << "\n[ RESULT ] CRASHED: " << e.what() << std::endl;
    }

    return 0;
}

script output on DGX-Spark:

--- Available CUDA Devices ---
Ordinal [0]: NVIDIA GB10 (122572 MB)

--- STEP 1: Building Initial Index ---
[668652][10:11:16:199063][info  ] optimizing graph
[668652][10:11:16:445967][info  ] Graph optimized, creating index
--- STEP 2: Moving to CPU Source ---
--- STEP 3: Populating New GPU Index from CPU ---
[ SEARCH ] Search (CPU Source ALIVE)...
[ SUCCESS ] Result ID: 33311
--- STEP 4: Destroying CPU Source ---
[ SEARCH ] Search (CPU Source DELETED)...

[ RESULT ] CRASHED: transform: failed inside CUB: cudaErrorInvalidDevice: invalid device ordinal
CUDA call='cudaFreeAsync(ptr, stream)' at file=/tmp/conda-bld-output/bld/rattler-build_libcuvs/work/cpp/src/neighbors/detail/cagra/compute_distance.hpp line=259 failed with an illegal memory access was encountered
Faiss assertion 'err__ == cudaSuccess' failed in virtual faiss::gpu::StandardGpuResourcesImpl::~StandardGpuResourcesImpl() at /home/faiss/faiss/gpu/StandardGpuResources.cpp:141; details: CUDA error 700 an illegal memory access was encountered
Aborted (core dumped)

Output on h100:

--- Available CUDA Devices ---
Ordinal [0]: NVIDIA H100 NVL (95319 MB)

--- STEP 1: Building Initial Index ---
[ 11236][10:25:38:098496][info  ] optimizing graph
[ 11236][10:25:38:310985][info  ] Graph optimized, creating index
--- STEP 2: Moving to CPU Source ---
--- STEP 3: Populating New GPU Index from CPU ---
[ SEARCH ] Search (CPU Source ALIVE)...
[ SUCCESS ] Result ID: 57286
--- STEP 4: Destroying CPU Source ---
[ SEARCH ] Search (CPU Source DELETED)...
[ SUCCESS ] Result ID: 57286

[ RESULT ] SURVIVED: Target GPU index is autonomous.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions