You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I decided to try the popular configuration min_p = 0.1 and temp = 1.5 or higher.
I get the following result:
I used the example LLama.Examples/Examples/LLama3ChatSession.cs
To show the incorrect behavior. The only thing I changed var chatHistory = new ChatHistory();
and
var inferenceParams = new InferenceParams
{
SamplingPipeline = new DefaultSamplingPipeline
{
Temperature = 1.5f,
MinP=0.1f,
},
MaxTokens = 100, // keep generating tokens until the anti prompt is encountered
AntiPrompts = [model.Tokens.EndOfTurnToken!] // model specific end of turn string
};
In my project I use BatchedExecutor with the correct formatting "Promt template" and Anti promts, and I get exactly the same result. I also changed the sampling order of ProcessTokenDataArray and it did not change anything. I tested it on CUDA and Vulkan. I noticed a pattern in that the first 20-30 tokens are correct, and then chaos begins.
In LM Studio and Kobold CPP I set the temperature even higher, and Min p even lower, but everything worked fine there
Reproduction Steps
Use DefaultSamplingPipeline
Set temperature higher than 1.2
Set min_p = 0.1 or higher
Environment & Configuration
Operating system: Win10
.NET runtime version: 8.0.4
LLamaSharp version: 0.16.0
CUDA version (if you are using cuda backend): 12
CPU & GPU device: RTX 3050 8gb and i5-12400
Model: L3-8B-Stheno-v3.2-Q6_K.gguf
Known Workarounds
No response
The text was updated successfully, but these errors were encountered:
PioneerMNDR
changed the title
[BUG]: DefaultSamplingPipeline - incorrect operation at high temperature
[BUG]: DefaultSamplingPipeline - strange behavior at high temperature
Sep 26, 2024
If possible, could you try adding some breakpoints/logging into the calls here. These are basically the lowest level calls, directly into llama.cpp
In particular looking for two things:
Are the values you set actually getting passed through correctly? Just to make sure there's not something overwriting the values you've set.
Are the other calls all being made with default values? Maybe try commenting them out just to be extra sure!
On a sidenote, the next version of LLamaSharp will be completely replacing sampling, because there has been a major redesign of the API on the llama.cpp side recently.
These are the values that are passed if they are not defined:
For the sake of purity of the experiment, I decided to comment out other samplers:
Nothing has changed(1):
Nothing has changed(2):
It feels like when he gets a high temperature he forgets EOS and starts hallucinating.
I decided to conduct an experiment. If I turn min_p up to 1, the model will always respond the same, regardless of temperature:
ex num
pic
Ex1
Ex2
Ex3
The experiment shows that min_p sampler works. And I really don't understand what the problem is. True, I reduced min_p to 0.01, and he still told the joke about the bicycle
Description
I decided to try the popular configuration min_p = 0.1 and temp = 1.5 or higher.
I get the following result:
I used the example LLama.Examples/Examples/LLama3ChatSession.cs
To show the incorrect behavior.
The only thing I changed
var chatHistory = new ChatHistory();
and
In my project I use BatchedExecutor with the correct formatting "Promt template" and Anti promts, and I get exactly the same result. I also changed the sampling order of ProcessTokenDataArray and it did not change anything. I tested it on CUDA and Vulkan. I noticed a pattern in that the first 20-30 tokens are correct, and then chaos begins.
In LM Studio and Kobold CPP I set the temperature even higher, and Min p even lower, but everything worked fine there
Reproduction Steps
Environment & Configuration
Known Workarounds
No response
The text was updated successfully, but these errors were encountered: