Open
Description
Description
Add a new decoding strategy, Min-P Sampling, to the Gemma text generation API. Min-P Sampling is designed to improve output diversity and coherence by selecting tokens based on the minimum cumulative probability threshold, striking a balance between deterministic and fully stochastic methods.
Motivation
- Enhanced Diversity & Creativity: Provides a tunable trade-off between randomness and quality, outperforming Top-k and Top-p in certain creative generation tasks.
- Reduced Repetition: Empirically shown to mitigate looping and repetitive token generation common with other samplers.
- User Control: Offers a single parameter
p
that is intuitive and consistent with existing sampling APIs.
Proposed Implementation
-
New Class
- Create
MinPSampling
undergemma.gm.text
, mirroring the API ofGreedy
,TopkSampling
, andToppSampling
.
- Create
-
Decoder Logic
- Compute the smallest prefix of the sorted token-probability list whose cumulative probability ≥
p
. - Sample the next token uniformly from this “min-p” set.
- Compute the smallest prefix of the sorted token-probability list whose cumulative probability ≥
-
Testing
- Unit tests covering edge cases (
p=0.0
,p=1.0
, very small vocabularies). - Integration tests comparing output distributions against reference implementation.
- Unit tests covering edge cases (
References
Implementation PR: #320
- Paper: Turning Up the Heat: Min-p Sampling for Creative and Coherent LLM Outputs ([arXiv:2407.01082](https://arxiv.org/abs/2407.01082))
Metadata
Metadata
Assignees
Labels
No labels