Skip to content

[BUG]: DefaultSamplingPipeline - strange behavior at high temperature #928

@PioneerMNDR

Description

@PioneerMNDR

Description

I decided to try the popular configuration min_p = 0.1 and temp = 1.5 or higher.
I get the following result:

image

I used the example LLama.Examples/Examples/LLama3ChatSession.cs
To show the incorrect behavior.
The only thing I changed
var chatHistory = new ChatHistory();
and

   var inferenceParams = new InferenceParams
  {
      SamplingPipeline = new DefaultSamplingPipeline
      {
          Temperature = 1.5f,
          MinP=0.1f,
                            
      },

      MaxTokens = 100, // keep generating tokens until the anti prompt is encountered
      AntiPrompts = [model.Tokens.EndOfTurnToken!] // model specific end of turn string
  };

In my project I use BatchedExecutor with the correct formatting "Promt template" and Anti promts, and I get exactly the same result. I also changed the sampling order of ProcessTokenDataArray and it did not change anything. I tested it on CUDA and Vulkan. I noticed a pattern in that the first 20-30 tokens are correct, and then chaos begins.

In LM Studio and Kobold CPP I set the temperature even higher, and Min p even lower, but everything worked fine there

Reproduction Steps

  1. Use DefaultSamplingPipeline
  2. Set temperature higher than 1.2
  3. Set min_p = 0.1 or higher

Environment & Configuration

  • Operating system: Win10
  • .NET runtime version: 8.0.4
  • LLamaSharp version: 0.16.0
  • CUDA version (if you are using cuda backend): 12
  • CPU & GPU device: RTX 3050 8gb and i5-12400
  • Model: L3-8B-Stheno-v3.2-Q6_K.gguf

Known Workarounds

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions