[Bug]: Context size ignored on PDF analysis & Model load failure after restart (qwen3-vl-4b-instruct / llama-cpp / cuda12-llama-cpp)

**LocalAI version:**  v3.8.0 (c0d1d0211f040461defb2547a97bdf1743a78e60)

**Environment, CPU architecture, OS, and Version:** Linux openmediavault 6.12.9+bpo-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.12.9-1~bpo12+1 (2025-01-19) x86_64 GNU/Linux

```bash
❯ nvidia-smi
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.247.01             Driver Version: 535.247.01   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 2070        Off | 00000000:02:00.0 Off |                  N/A |
| 29%   40C    P8              15W / 175W |   1437MiB /  8192MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A   1726766      C   /usr/bin/python3                           1434MiB |
+---------------------------------------------------------------------------------------+
```

**Description**
I just installed LocalAI and I am encountering issues when attempting to use a model named `qwen3-vl-4b-instruct` to analyze small PDF documents. I am facing two distinct behaviors:

1.  **Context Size Error:** When the model loads successfully, attempting to summarize a small PDF results in an immediate error stating the request exceeds the context size. This happens regardless of the `context_size` defined in the YAML.
2.  **Load Failure after Restart:** If I restart the container and try to use the model again, it fails to load entirely with a "Canceled" RPC error.

I have explicitly tried setting the backend to both `llama-cpp` and `cuda12-llama-cpp` in the YAML configuration, but the issues persist.

**Error Messages**

*   *Scenario 1 (Processing PDF):*
    ```text
    the request exceeds the available context size, try increasing itInternal error: rpc error: code = Internal desc = the request exceeds the available context size, try increasing it
    ```
*   *Scenario 2 (After restarting container):*
    ```text
    Internal error: failed to load model with internal loader: could not load model: rpc error: code = Canceled desc =
    ```

**Docker Compose:**
```yaml
services:
  localai:
    image: localai/localai:latest-gpu-nvidia-cuda-12
    container_name: localai
    hostname: localai
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/readyz"]
      interval: 1m
      timeout: 20m
      retries: 5
    networks:
      - all-services_default
    ports:
      - 8079:8080
    environment:
      - DEBUG=true
      - LOCALAI_SINGLE_ACTIVE_BACKEND=true
    volumes:
      - ${DOCKER_DATA_PATH}/config/localai/models:/models
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: Context size ignored on PDF analysis & Model load failure after restart (qwen3-vl-4b-instruct / llama-cpp / cuda12-llama-cpp) #7426

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: Context size ignored on PDF analysis & Model load failure after restart (qwen3-vl-4b-instruct / llama-cpp / cuda12-llama-cpp) #7426

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions