Skip to content

mem 用完自动kill #7601

@missTL

Description

@missTL

Reminder

  • I have read the above rules and searched the existing issues.

System Info

  • llamafactory version: 0.9.2.dev0
  • Platform: Linux-5.4.225-1.el7.elrepo.x86_64-x86_64-with-glibc2.17
  • Python version: 3.10.16
  • PyTorch version: 2.5.1+cu124 (GPU)
  • Transformers version: 4.49.0
  • Datasets version: 3.1.0
  • Accelerate version: 1.0.1
  • PEFT version: 0.12.0
  • TRL version: 0.9.6
  • GPU type: NVIDIA RTX A6000
  • DeepSpeed version: 0.14.4
  • Bitsandbytes version: 0.45.0
  • vLLM version: 0.7.3

Reproduction

Image Image
mem会用完,并自动kill。500G内存都不够吗?val_128world_motion_muti数据只有5000条样本
VLLM推理命令:CUDA_VISIBLE_DEVICES='4,5,6,7' python scripts/vllm_infer.py --model_name_or_path  /home/zengshuang.zs/output/llm/v4.3_128 --dataset val_128world_motion_muti --template qwen2_vl --cutoff_len 32768 --max_new_tokens 3500 --max_samples 100000 --image_resolution 524288 --save_name world_v4.3_128.jsonl --temperature 0.1 --top_p 0.1 --top_k 10

Others

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    invalidThis doesn't seem right

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions