[BUG]: DefaultSamplingPipeline - strange behavior at high temperature #928

PioneerMNDR · 2024-09-26T11:54:19Z

Description

I decided to try the popular configuration min_p = 0.1 and temp = 1.5 or higher.
I get the following result:

I used the example LLama.Examples/Examples/LLama3ChatSession.cs
To show the incorrect behavior.
The only thing I changed
var chatHistory = new ChatHistory();
and

   var inferenceParams = new InferenceParams
  {
      SamplingPipeline = new DefaultSamplingPipeline
      {
          Temperature = 1.5f,
          MinP=0.1f,
                            
      },

      MaxTokens = 100, // keep generating tokens until the anti prompt is encountered
      AntiPrompts = [model.Tokens.EndOfTurnToken!] // model specific end of turn string
  };

In my project I use BatchedExecutor with the correct formatting "Promt template" and Anti promts, and I get exactly the same result. I also changed the sampling order of ProcessTokenDataArray and it did not change anything. I tested it on CUDA and Vulkan. I noticed a pattern in that the first 20-30 tokens are correct, and then chaos begins.

In LM Studio and Kobold CPP I set the temperature even higher, and Min p even lower, but everything worked fine there

Reproduction Steps

Use DefaultSamplingPipeline
Set temperature higher than 1.2
Set min_p = 0.1 or higher

Environment & Configuration

Operating system: Win10
.NET runtime version: 8.0.4
LLamaSharp version: 0.16.0
CUDA version (if you are using cuda backend): 12
CPU & GPU device: RTX 3050 8gb and i5-12400
Model: L3-8B-Stheno-v3.2-Q6_K.gguf

Known Workarounds

No response

The text was updated successfully, but these errors were encountered:

martindevans · 2024-09-26T15:06:52Z

If possible, could you try adding some breakpoints/logging into the calls here. These are basically the lowest level calls, directly into llama.cpp

In particular looking for two things:

Are the values you set actually getting passed through correctly? Just to make sure there's not something overwriting the values you've set.
Are the other calls all being made with default values? Maybe try commenting them out just to be extra sure!

On a sidenote, the next version of LLamaSharp will be completely replacing sampling, because there has been a major redesign of the API on the llama.cpp side recently.

PioneerMNDR · 2024-09-27T10:37:50Z

These are the values that are passed if they are not defined:

For the sake of purity of the experiment, I decided to comment out other samplers:

Nothing has changed(1):

Nothing has changed(2):

It feels like when he gets a high temperature he forgets EOS and starts hallucinating.
I decided to conduct an experiment. If I turn min_p up to 1, the model will always respond the same, regardless of temperature:

ex num	pic
Ex1
Ex2
Ex3

The experiment shows that min_p sampler works. And I really don't understand what the problem is. True, I reduced min_p to 0.01, and he still told the joke about the bicycle

PioneerMNDR changed the title ~~[BUG]: DefaultSamplingPipeline - incorrect operation at high temperature~~ [BUG]: DefaultSamplingPipeline - strange behavior at high temperature Sep 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG]: DefaultSamplingPipeline - strange behavior at high temperature #928

[BUG]: DefaultSamplingPipeline - strange behavior at high temperature #928

PioneerMNDR commented Sep 26, 2024 •

edited

Loading

martindevans commented Sep 26, 2024

PioneerMNDR commented Sep 27, 2024

[BUG]: DefaultSamplingPipeline - strange behavior at high temperature #928

[BUG]: DefaultSamplingPipeline - strange behavior at high temperature #928

Comments

PioneerMNDR commented Sep 26, 2024 • edited Loading

Description

Reproduction Steps

Environment & Configuration

Known Workarounds

martindevans commented Sep 26, 2024

PioneerMNDR commented Sep 27, 2024

PioneerMNDR commented Sep 26, 2024 •

edited

Loading