-
Notifications
You must be signed in to change notification settings - Fork 4.1k
Description
Question 1: options.useFloat16 cannot reduce GPU memory usage
embeddings = np.random.rand(5000000, 2048).astype(np.float32)
index = faiss.IndexFlatIP(2048)
index = faiss.IndexIVFFlat(index, 2048, 32768)
options = faiss.GpuMultipleClonerOptions()
options.shard = True
options.useFloat16 = True
gpu_index = faiss.index_cpu_to_all_gpus(index, co=options, ngpu=1)
gpu_index.train(embeddings[:1000000])
gpu_index.add(embeddings)Whether I set options.useFloat16 = True or not, the GPU memory usage remains consistent. Does IndexIVF (e.g. IndexIVFFlat, IndexIVFPQ) require all vectors to be stored in fp32?
Question 2: How to further reduce GPU memory usage
I have a huge (56000000, 2048) embedding, which is similar to #4502. I use 8 A100 80G GPUs to perform index.add. Even though this embedding alone consumes 56,000,000 * 2048 * 4 bytes, approximately 427GB, due to useFloat16 not working and extra GPU memory usage, my total of 640GB across 8 A100 GPUs still couldn't smoothly complete the index.add operation.
Are there any methods to reduce GPU memory usage? Or is it impossible for me to use IndexIVFFlat in this situation?