-
-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Out of ranage integral type conversation attempted #6511
Comments
I am running into the same issue, also on Debian 12, on an older Intel CPU, while trying to run a Qwen2.5 exl2 model over the Open AI API (With cline and aider). In my case a few requests work and then it encounters this error after which the responses contain no/few characters. Unloading and loading the model again doesn't seem to help. I'm running web-ui directly on physical hardware. I tried upgrading all the packages in my system which brought in a new kernel version but nothing changed after the upgrade. Logs
lscpu
free -m
nvidia-smi
uname -a
python3 --version
conda --version
cat /etc/debian_version
git rev-parse HEAD
|
Actually, it seems like the ExLlamav2 loader works. Previously I was using the auto suggested ExLlamav2_HF loader. Logs (prompts were sent from Aider)
|
Just wanted to confirm that I have the same issue. Model: bartowski/Qwen2.5-Coder-32B-Instruct-exl2 @ 4.25 |
Describe the bug
When running inference over openAI compatable API with Perplexica or avante.nvim the error sometimes appears, after that happnes it doesn't work anymore until I restart the program. (It worked fine with Open WebUI)
Is there an existing issue for this?
Reproduction
Screenshot
Logs
System Info
The text was updated successfully, but these errors were encountered: