Feature Request: Support for FP4 and FP8 Quantization in FAISS #4538

immortalshadow007 · 2025-08-19T03:15:29Z

immortalshadow007
Aug 19, 2025

Background

FAISS currently supports a variety of quantization formats such as FP16 (SQfp16) and signed INT8 (QT_8bit_direct_signed) for memory-efficient similarity search. These have been extremely valuable for large-scale vector databases and retrieval tasks.

With the rise of ultra-low precision formats like FP8 (already being adopted in NVIDIA Hopper/Blackwell architectures) and FP4 (second-generation Transformer Engines in NVIDIA GB200), the community is increasingly looking for vector search systems that can natively support these datatypes.

Motivation:

Memory Efficiency: FP8 and FP4 allow embeddings to be stored with significantly reduced memory footprint compared to FP16 or INT8, which is critical for billion-scale indexes.
Hardware Acceleration: New GPUs (e.g., NVIDIA H100, B100/GB200) provide native FP8/FP4 support in their Tensor Cores, making these formats ideal for high-performance retrieval workloads.
LLM Workloads: Large Language Models (LLMs) increasingly use FP8/FP4 for inference, meaning embeddings are often naturally generated in these formats. Avoiding upcasting (e.g., FP8 → FP16 → FP32) would reduce both latency and memory overhead.

Proposed Features:

FP8 Scalar Quantization (SQfp8)

Similar to existing SQfp16, but supporting both E5M2 and E4M3 FP8 formats.

FP4 Scalar Quantization (SQfp4)

Compact storage for embeddings in FP4 format, with optional scaling factors for accuracy recovery.

Interoperability

Allow mixed-precision indexes (e.g., storage in FP4/FP8, computation in FP16/FP32).
API support for directly ingesting FP8/FP4 embeddings without manual conversion.

Potential Benefits:

Up to 4–8× memory savings compared to FP32.
Direct compatibility with embeddings produced by FP8/FP4-aware models.
Future-proofing FAISS for next-gen LLM inference pipelines.

#References:

NVIDIA Transformer Engine (FP8, FP4): NVIDIA Developer Blog
Research on FP8/FP4 training and inference: "FP8 Formats for Deep Learning"

alexanderguzhva · 2025-08-19T12:03:14Z

alexanderguzhva
Aug 19, 2025

@immortalshadow007 The Faiss team welcomes code donations from the community.
It is possible that Nvidia cuVS library might get such a support.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature Request: Support for FP4 and FP8 Quantization in FAISS #4538

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Feature Request: Support for FP4 and FP8 Quantization in FAISS #4538

Uh oh!

immortalshadow007 Aug 19, 2025

Replies: 1 comment

Uh oh!

alexanderguzhva Aug 19, 2025

immortalshadow007
Aug 19, 2025

alexanderguzhva
Aug 19, 2025