1-bit quantization? #754

PABannier · 2024-03-01T01:04:17Z

PABannier
Mar 1, 2024

A team from Microsoft recently came up with 1-bit quantization drastically reducing memory footprint and token throughput. The weights are effectively encoded as a ternary bit {-1, 0, 1}. Reported results are super encouraging.

@ggerganov is it of interest for GGML? I can implement it and perform a bunch of benchmarks.

Green-Sky · 2024-03-01T10:20:06Z

Green-Sky
Mar 1, 2024

ggml-org/llama.cpp#5761 more on this topic :)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

1-bit quantization? #754

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

1-bit quantization? #754

Uh oh!

PABannier Mar 1, 2024

Replies: 1 comment

Uh oh!

Green-Sky Mar 1, 2024

PABannier
Mar 1, 2024

Green-Sky
Mar 1, 2024