[Bug]: 从魔塔社区下载的chatglm模型启动报错

### Installation Method | 安装方法与平台

Pip Install (I used latest requirements.txt)

### Version | 版本

Latest | 最新版

### OS | 操作系统

Linux

### Describe the bug | 简述

# 背景
docker-compose启动的服务，因为网络问题无法从huggingface.co下载模型，是从魔塔社区下载的模型

modelscope download --model ZhipuAI/glm-4-9b-chat --local_dir ./models/THUDM/glm-4-9b-chat

# 1. docker-compose文件如下:
```
version: '3'
services:
  gpt_academic_full_capability:
    image: ghcr.io/binary-husky/gpt_academic_with_all_capacity:master
    environment:
      CHATGLM_LOCAL_MODEL_PATH: 'THUDM/glm-4-9b-chat'  
      LOCAL_MODEL_DEVICE: 'cuda'  
      LOCAL_MODEL_QUANT: 'FP16'      
      API_KEY: 'sk-xx'
      DASHSCOPE_API_KEY: 'sk-xx'
      USE_PROXY: 'False'
      LLM_MODEL: 'gpt-4o'
      AVAIL_LLM_MODELS: '["gpt-3.5-turbo", "gpt-4o", "qwen-max-latest", "chatglm4","deepseek-r1","deepseek-v3","chatglm3-6b"]'
      ENABLE_AUDIO: 'False'
      DEFAULT_WORKER_NUM: '20'
      WEB_PORT: '18080'
      ADD_WAIFU: 'False'
      ALIYUN_APPKEY: 'RxPlZrM88DnAFkZK'
      THEME: 'Chuanhu-Small-and-Beautiful'
      LOCAL_MODEL_DEVICE: 'cuda'
      API_URL_REDIRECT: >
        {
          "https://api.openai.com/v1/chat/completions": "https://api.gptsapi.net/v1/chat/completions",
          "https://api.openai.com/v1/completions": "https://api.gptsapi.net/v1/completions",
          "https://api.openai.com/v1/embeddings": "https://api.gptsapi.net/v1/embeddings"
        }
      CHATGLM_LOCAL_MODEL_PATH: '/models/THUDM/glm-4-9b-chat'  

    volumes:  
      - /root/models/THUDM/glm-4-9b-chat:/models/THUDM/glm-4-9b-chat:ro  
    
    runtime: nvidia
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

    # network_mode: "host"

    ports:
      - "18080:18080"

    command: >
      bash -c "python3 -u main.py"
```

# 2. 后端服务日志:
```
14:02 | ..v_variable:33  | [ENV_VAR] 尝试加载CHATGLM_LOCAL_MODEL_PATH，默认值：THUDM/glm-4-9b-chat --> 修正值：/models/THUDM/glm-4-9b-chat
14:02 | ..v_variable:60  | [ENV_VAR] 成功读取环境变量CHATGLM_LOCAL_MODEL_PATH
14:02 | ..v_variable:33  | [ENV_VAR] 尝试加载LOCAL_MODEL_DEVICE，默认值：cpu --> 修正值：cuda
14:02 | ..v_variable:60  | [ENV_VAR] 成功读取环境变量LOCAL_MODEL_DEVICE
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Loading checkpoint shards:   0%|          | 0/10 [00:00<?, ?it/s]
Loading checkpoint shards:  10%|█         | 1/10 [00:00<00:01,  4.88it/s]
Loading checkpoint shards:  20%|██        | 2/10 [00:00<00:01,  6.09it/s]
Loading checkpoint shards:  30%|███       | 3/10 [00:00<00:01,  6.68it/s]
Loading checkpoint shards:  40%|████      | 4/10 [00:00<00:00,  6.77it/s]
Loading checkpoint shards:  50%|█████     | 5/10 [00:00<00:00,  6.99it/s]
Loading checkpoint shards:  60%|██████    | 6/10 [00:00<00:00,  7.21it/s]
Loading checkpoint shards:  70%|███████   | 7/10 [00:01<00:00,  7.30it/s]
Loading checkpoint shards:  80%|████████  | 8/10 [00:01<00:00,  7.29it/s]
Loading checkpoint shards:  90%|█████████ | 9/10 [00:01<00:00,  7.33it/s]
Loading checkpoint shards: 100%|██████████| 10/10 [00:01<00:00,  7.37it/s]
Loading checkpoint shards: 100%|██████████| 10/10 [00:01<00:00,  7.03it/s]
```
# 3. 前端页面报错:
```
Traceback (most recent call last):
  File "./request_llms/local_llm_class.py", line 160, in run
    for response_full in self.llm_stream_generator(**kwargs):
  File "./request_llms/bridge_chatglm4.py", line 63, in llm_stream_generator
    outputs = self._model.generate(**inputs, **gen_kwargs)
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/transformers/generation/utils.py", line 1758, in generate
    raise ImportError(
  File "/usr/local/lib/python3.8/dist-packages/transformers/generation/utils.py", line 2449, in _sample
  File "/root/.cache/huggingface/modules/transformers_modules/glm-4-9b-chat/modeling_chatglm.py", line 929, in _update_model_kwargs_for_generation
    cache_name, cache = self._extract_past_from_model_output(outputs)
ValueError: too many values to unpack (expected 2)
```

### Screen Shot | 有帮助的截图

已在日志中体现

### Terminal Traceback & Material to Help Reproduce Bugs | 终端traceback（如有） + 帮助我们复现的测试材料样本（如有）

已在日志中体现

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: 从魔塔社区下载的chatglm模型启动报错 #2206

Installation Method | 安装方法与平台

Version | 版本

OS | 操作系统

Describe the bug | 简述

背景

1. docker-compose文件如下:

2. 后端服务日志:

3. 前端页面报错:

Screen Shot | 有帮助的截图

Terminal Traceback & Material to Help Reproduce Bugs | 终端traceback（如有） + 帮助我们复现的测试材料样本（如有）

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Bug]: 从魔塔社区下载的chatglm模型启动报错 #2206

Description

Installation Method | 安装方法与平台

Version | 版本

OS | 操作系统

Describe the bug | 简述

背景

1. docker-compose文件如下:

2. 后端服务日志:

3. 前端页面报错:

Screen Shot | 有帮助的截图

Terminal Traceback & Material to Help Reproduce Bugs | 终端traceback（如有） + 帮助我们复现的测试材料样本（如有）

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions