-
-
Notifications
You must be signed in to change notification settings - Fork 3.3k
Open
Labels
Description
LocalAI version: v3.8.0 (c0d1d02)
Environment, CPU architecture, OS, and Version: Linux openmediavault 6.12.9+bpo-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.12.9-1~bpo12+1 (2025-01-19) x86_64 GNU/Linux
❯ nvidia-smi
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.247.01 Driver Version: 535.247.01 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 2070 Off | 00000000:02:00.0 Off | N/A |
| 29% 40C P8 15W / 175W | 1437MiB / 8192MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 1726766 C /usr/bin/python3 1434MiB |
+---------------------------------------------------------------------------------------+Description
I just installed LocalAI and I am encountering issues when attempting to use a model named qwen3-vl-4b-instruct to analyze small PDF documents. I am facing two distinct behaviors:
- Context Size Error: When the model loads successfully, attempting to summarize a small PDF results in an immediate error stating the request exceeds the context size. This happens regardless of the
context_sizedefined in the YAML. - Load Failure after Restart: If I restart the container and try to use the model again, it fails to load entirely with a "Canceled" RPC error.
I have explicitly tried setting the backend to both llama-cpp and cuda12-llama-cpp in the YAML configuration, but the issues persist.
Error Messages
- Scenario 1 (Processing PDF):
the request exceeds the available context size, try increasing itInternal error: rpc error: code = Internal desc = the request exceeds the available context size, try increasing it - Scenario 2 (After restarting container):
Internal error: failed to load model with internal loader: could not load model: rpc error: code = Canceled desc =
Docker Compose:
services:
localai:
image: localai/localai:latest-gpu-nvidia-cuda-12
container_name: localai
hostname: localai
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/readyz"]
interval: 1m
timeout: 20m
retries: 5
networks:
- all-services_default
ports:
- 8079:8080
environment:
- DEBUG=true
- LOCALAI_SINGLE_ACTIVE_BACKEND=true
volumes:
- ${DOCKER_DATA_PATH}/config/localai/models:/models
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]homeserverhq