GpuIndexCagra fails search if the source IndexHNSWCagra is destroyed after copyFrom

My Goal: 
---
Happy New Year!

I want to be able to persist CAGRA GPU indexes.

The standard Faiss workflow for this as I understand it, and please correct me if I am wrong, is:

1 GpuIndexCagra (GPU) $\rightarrow$ copyTo $\rightarrow$ IndexHNSWCagra (CPU).
2. write_index (CPU Index) to disk.
3. read_index (CPU Index) from disk.
4. IndexHNSWCagra(CPU) $\rightarrow$ copyFrom or index_cpu_to_gpu $\rightarrow$ GpuIndexCagra (GPU).

Problem Description:
---
In DGX-Spark :

When cloning a CPU `IndexHNSWCagra` to a GPU CAGRA index using `copyFrom/index_cpu_to_gpu`, the resulting GPU index appears to maintain a shallow dependency on the host memory of the source index. If the source CPU index is destroyed, subsequent searches on the GPU index fail with a CUDA illegal memory access. I also get invalid  device ordinal but you can see on the output that it should be 0. I tried to follow this pattern from `index_cpu_to_gpu` internally : [GpuCloner.cpp#L236-L244](https://github.com/facebookresearch/faiss/blob/main/faiss/gpu/GpuCloner.cpp#L236-L244)

Interesting:
- In a different H100 machine: The same script works, and I do not get the above problem. Which might hint to either a unified memory issue, or some mis-configuration? 
- Also if you reduce nb, so that we're using less memory in the DGX-Spark the crash does not happen.

Setup
----
Tested on both machines in a container with library versions: 
```
-- Faiss version:  1.13.0
-- cuVS version:   25.08.0
-- RAFT version:   25.08.0
-- CUDA version:   12.8.93
```

Minimal script for reproducibility:
``` cpp
#include <cuda_runtime.h>
#include <faiss/gpu/GpuIndexCagra.h>
#include <faiss/gpu/StandardGpuResources.h>
#include <faiss/IndexHNSW.h>
#include <iostream>
#include <memory>
#include <vector>

void print_available_devices() {
    int deviceCount = 0;
    if (cudaGetDeviceCount(&deviceCount) != cudaSuccess || deviceCount == 0) {
        std::cerr << "No CUDA devices found." << std::endl;
        return;
    }
    std::cout << "--- Available CUDA Devices ---" << std::endl;
    for (int i = 0; i < deviceCount; ++i) {
        cudaDeviceProp prop;
        cudaGetDeviceProperties(&prop, i);
        std::cout << "Ordinal [" << i << "]: " << prop.name << " (" << prop.totalGlobalMem / 1024 / 1024 << " MB)" << std::endl;
    }
}

void run_search(faiss::gpu::GpuIndexCagra* index, int d, const std::string& label) {
    std::vector<float> query(d, 0.5f);
    std::vector<float> dists(1);
    faiss::idx_t labels[1];
    std::cout << "[ SEARCH ] " << label << "..." << std::endl;
    index->search(1, query.data(), 1, dists.data(), labels);
    std::cout << "[ SUCCESS ] Result ID: " << labels[0] << std::endl;
}

int main() {
    print_available_devices();

    const int d = 128;
    const int nb = 100000;
    faiss::gpu::StandardGpuResources res;
    faiss::gpu::GpuIndexCagraConfig config;
    config.device = 0;

    // 1. Setup Data
    std::vector<float> data(nb * d);
    for (auto& val : data) val = (float)rand() / RAND_MAX;

    // 2. Build Initial GPU Index
    std::cout << "\n--- STEP 1: Building Initial Index ---" << std::endl;
    auto gpu_index = std::make_unique<faiss::gpu::GpuIndexCagra>(&res, d, faiss::METRIC_L2, config);
    gpu_index->train(nb, data.data());

    // 3. Move it to CPU (The "Source" of the copy)
    std::cout << "--- STEP 2: Moving to CPU Source ---" << std::endl;
    auto cpu_source = std::make_unique<faiss::IndexHNSWCagra>(d, faiss::METRIC_L2);
    gpu_index->copyTo(cpu_source.get());

    // 4. Populate a NEW GPU Index from that CPU Source
    std::cout << "--- STEP 3: Populating New GPU Index from CPU ---" << std::endl;
    auto target_gpu = std::make_unique<faiss::gpu::GpuIndexCagra>(&res, d, faiss::METRIC_L2, config);
    target_gpu->copyFrom(cpu_source.get());

    // 5. Verification: Search before and after destroying the CPU source
    run_search(target_gpu.get(), d, "Search (CPU Source ALIVE)");

    std::cout << "--- STEP 4: Destroying CPU Source ---" << std::endl;
    cpu_source.reset(); // This tests if 'target_gpu' has a shallow or deep copy

    try {
        run_search(target_gpu.get(), d, "Search (CPU Source DELETED)");
        std::cout << "\n[ RESULT ] SURVIVED: Target GPU index is autonomous." << std::endl;
    } catch (const std::exception& e) {
        std::cerr << "\n[ RESULT ] CRASHED: " << e.what() << std::endl;
    }

    return 0;
}
```
script output on DGX-Spark:
```
--- Available CUDA Devices ---
Ordinal [0]: NVIDIA GB10 (122572 MB)

--- STEP 1: Building Initial Index ---
[668652][10:11:16:199063][info  ] optimizing graph
[668652][10:11:16:445967][info  ] Graph optimized, creating index
--- STEP 2: Moving to CPU Source ---
--- STEP 3: Populating New GPU Index from CPU ---
[ SEARCH ] Search (CPU Source ALIVE)...
[ SUCCESS ] Result ID: 33311
--- STEP 4: Destroying CPU Source ---
[ SEARCH ] Search (CPU Source DELETED)...

[ RESULT ] CRASHED: transform: failed inside CUB: cudaErrorInvalidDevice: invalid device ordinal
CUDA call='cudaFreeAsync(ptr, stream)' at file=/tmp/conda-bld-output/bld/rattler-build_libcuvs/work/cpp/src/neighbors/detail/cagra/compute_distance.hpp line=259 failed with an illegal memory access was encountered
Faiss assertion 'err__ == cudaSuccess' failed in virtual faiss::gpu::StandardGpuResourcesImpl::~StandardGpuResourcesImpl() at /home/faiss/faiss/gpu/StandardGpuResources.cpp:141; details: CUDA error 700 an illegal memory access was encountered
Aborted (core dumped)
```

Output on h100:
```
--- Available CUDA Devices ---
Ordinal [0]: NVIDIA H100 NVL (95319 MB)

--- STEP 1: Building Initial Index ---
[ 11236][10:25:38:098496][info  ] optimizing graph
[ 11236][10:25:38:310985][info  ] Graph optimized, creating index
--- STEP 2: Moving to CPU Source ---
--- STEP 3: Populating New GPU Index from CPU ---
[ SEARCH ] Search (CPU Source ALIVE)...
[ SUCCESS ] Result ID: 57286
--- STEP 4: Destroying CPU Source ---
[ SEARCH ] Search (CPU Source DELETED)...
[ SUCCESS ] Result ID: 57286

[ RESULT ] SURVIVED: Target GPU index is autonomous.
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GpuIndexCagra fails search if the source IndexHNSWCagra is destroyed after copyFrom #4742

My Goal:

Problem Description:

Setup

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

GpuIndexCagra fails search if the source IndexHNSWCagra is destroyed after copyFrom #4742

Description

My Goal:

Problem Description:

Setup

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions