Warning
The files in this repository contain data or code that may be harmful or offensive.
See the corresponding research paper here: Key-Value Cache Quantization in Large Language Models: A Safety Benchmark
Note
Stable.
This benchmark evaluates the effect of KV cache quantization on LLM response safety using sample questions distributed among 13 forbidden scenarios.
Note
Currently, only the Meta Llama-2 7B Chat model is implemented in the benchmark with HQQ backend. This benchmark serves as a proof-of-concept. Other models, model families, and backends are considerations for future work.
- Code is adapted from QuantizedKVCache_Generation_Transformers.ipynb
- Benchmark is largely inspired by "Do Anything Now": Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models