Skip to content

[Bug]: 从魔塔社区下载的chatglm模型启动报错 #2206

@myysophia

Description

@myysophia

Installation Method | 安装方法与平台

Pip Install (I used latest requirements.txt)

Version | 版本

Latest | 最新版

OS | 操作系统

Linux

Describe the bug | 简述

背景

docker-compose启动的服务,因为网络问题无法从huggingface.co下载模型,是从魔塔社区下载的模型

modelscope download --model ZhipuAI/glm-4-9b-chat --local_dir ./models/THUDM/glm-4-9b-chat

1. docker-compose文件如下:

version: '3'
services:
  gpt_academic_full_capability:
    image: ghcr.io/binary-husky/gpt_academic_with_all_capacity:master
    environment:
      CHATGLM_LOCAL_MODEL_PATH: 'THUDM/glm-4-9b-chat'  
      LOCAL_MODEL_DEVICE: 'cuda'  
      LOCAL_MODEL_QUANT: 'FP16'      
      API_KEY: 'sk-xx'
      DASHSCOPE_API_KEY: 'sk-xx'
      USE_PROXY: 'False'
      LLM_MODEL: 'gpt-4o'
      AVAIL_LLM_MODELS: '["gpt-3.5-turbo", "gpt-4o", "qwen-max-latest", "chatglm4","deepseek-r1","deepseek-v3","chatglm3-6b"]'
      ENABLE_AUDIO: 'False'
      DEFAULT_WORKER_NUM: '20'
      WEB_PORT: '18080'
      ADD_WAIFU: 'False'
      ALIYUN_APPKEY: 'RxPlZrM88DnAFkZK'
      THEME: 'Chuanhu-Small-and-Beautiful'
      LOCAL_MODEL_DEVICE: 'cuda'
      API_URL_REDIRECT: >
        {
          "https://api.openai.com/v1/chat/completions": "https://api.gptsapi.net/v1/chat/completions",
          "https://api.openai.com/v1/completions": "https://api.gptsapi.net/v1/completions",
          "https://api.openai.com/v1/embeddings": "https://api.gptsapi.net/v1/embeddings"
        }
      CHATGLM_LOCAL_MODEL_PATH: '/models/THUDM/glm-4-9b-chat'  

    volumes:  
      - /root/models/THUDM/glm-4-9b-chat:/models/THUDM/glm-4-9b-chat:ro  
    
    runtime: nvidia
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

    # network_mode: "host"

    ports:
      - "18080:18080"

    command: >
      bash -c "python3 -u main.py"

2. 后端服务日志:

14:02 | ..v_variable:33  | [ENV_VAR] 尝试加载CHATGLM_LOCAL_MODEL_PATH,默认值:THUDM/glm-4-9b-chat --> 修正值:/models/THUDM/glm-4-9b-chat
14:02 | ..v_variable:60  | [ENV_VAR] 成功读取环境变量CHATGLM_LOCAL_MODEL_PATH
14:02 | ..v_variable:33  | [ENV_VAR] 尝试加载LOCAL_MODEL_DEVICE,默认值:cpu --> 修正值:cuda
14:02 | ..v_variable:60  | [ENV_VAR] 成功读取环境变量LOCAL_MODEL_DEVICE
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Loading checkpoint shards:   0%|          | 0/10 [00:00<?, ?it/s]
Loading checkpoint shards:  10%|█         | 1/10 [00:00<00:01,  4.88it/s]
Loading checkpoint shards:  20%|██        | 2/10 [00:00<00:01,  6.09it/s]
Loading checkpoint shards:  30%|███       | 3/10 [00:00<00:01,  6.68it/s]
Loading checkpoint shards:  40%|████      | 4/10 [00:00<00:00,  6.77it/s]
Loading checkpoint shards:  50%|█████     | 5/10 [00:00<00:00,  6.99it/s]
Loading checkpoint shards:  60%|██████    | 6/10 [00:00<00:00,  7.21it/s]
Loading checkpoint shards:  70%|███████   | 7/10 [00:01<00:00,  7.30it/s]
Loading checkpoint shards:  80%|████████  | 8/10 [00:01<00:00,  7.29it/s]
Loading checkpoint shards:  90%|█████████ | 9/10 [00:01<00:00,  7.33it/s]
Loading checkpoint shards: 100%|██████████| 10/10 [00:01<00:00,  7.37it/s]
Loading checkpoint shards: 100%|██████████| 10/10 [00:01<00:00,  7.03it/s]

3. 前端页面报错:

Traceback (most recent call last):
  File "./request_llms/local_llm_class.py", line 160, in run
    for response_full in self.llm_stream_generator(**kwargs):
  File "./request_llms/bridge_chatglm4.py", line 63, in llm_stream_generator
    outputs = self._model.generate(**inputs, **gen_kwargs)
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/transformers/generation/utils.py", line 1758, in generate
    raise ImportError(
  File "/usr/local/lib/python3.8/dist-packages/transformers/generation/utils.py", line 2449, in _sample
  File "/root/.cache/huggingface/modules/transformers_modules/glm-4-9b-chat/modeling_chatglm.py", line 929, in _update_model_kwargs_for_generation
    cache_name, cache = self._extract_past_from_model_output(outputs)
ValueError: too many values to unpack (expected 2)

Screen Shot | 有帮助的截图

已在日志中体现

Terminal Traceback & Material to Help Reproduce Bugs | 终端traceback(如有) + 帮助我们复现的测试材料样本(如有)

已在日志中体现

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions