Skip to content

[DOTry to use lora_finetune.py directly and get an error /bin/bash: line 1: export: `=/usr/bin/supervisord': not a valid identifier #6235

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
tanghl01 opened this issue Mar 2, 2025 · 1 comment
Labels
documentation Improvements or additions to documentation

Comments

@tanghl01
Copy link

tanghl01 commented Mar 2, 2025

📚 The doc issue

git clone https://github.com/hpcaitech/ColossalAI.git
cd ColossalAI

install dependency

pip install -r requirements/requirements.txt

install colossalai

BUILD_EXT=1 pip install .
export CUDA_VISIBLE_DEVICES=0
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
export CUDA_INSTALL_DIR=/usr/local/cuda-12.4/
export CUDA_HOME=/usr/local/cuda-12.4/

colossalai check -i

Installation Report

------------ Environment ------------
Colossal-AI version: 0.4.8
PyTorch version: 2.5.1
System CUDA version: 12.4
CUDA version required by PyTorch: 12.4

Note:

  1. The table above checks the versions of the libraries/tools in the current environment
  2. If the System CUDA version is N/A, you can set the CUDA_HOME environment variable to locate it
  3. If the CUDA version required by PyTorch is N/A, you probably did not install a CUDA-compatible PyTorch. This value is give by torch.version.cuda and you can go to https://pytorch.org/get-started/locally/ to download the correct version.

------------ CUDA Extensions AOT Compilation ------------
Found AOT CUDA Extension: ✓
PyTorch version used for AOT compilation: N/A
CUDA version used for AOT compilation: N/A

Note:

  1. AOT (ahead-of-time) compilation of the CUDA kernels occurs during installation when the environment variable BUILD_EXT=1 is set
  2. If AOT compilation is not enabled, stay calm as the CUDA kernels can still be built during runtime

------------ Compatibility ------------
PyTorch version match: N/A
System and PyTorch CUDA version match: ✓
System and Colossal-AI CUDA version match: N/A

Note:

  1. The table above checks the version compatibility of the libraries/tools in the current environment
    • PyTorch version mismatch: whether the PyTorch version in the current environment is compatible with the PyTorch version used for AOT compilation
    • System and PyTorch CUDA version match: whether the CUDA version in the current environment is compatible with the CUDA version required by PyTorch
    • System and Colossal-AI CUDA version match: whether the CUDA version in the current environment is compatible with the CUDA version used for AOT compilation

#colossalai run --nproc_per_node 1 lora_finetune.py --pretrained "/root/autodl-tmp/DeepSeeK-R1-7B" --dataset "/root/converted_data.json" --quant 4 --lora_rank 32 --lora_alpha 64 --batch_size 8 --gradient_accumulation 2 --max_length 1024 --lr 1.5e-4 --warmup_steps 50 --num_epochs 3 --save_dir "/root/autodl-tmp/DeepSeeK_lora" --grad_ckpt --dtype bf16

/bin/bash: line 1: export: `=/usr/bin/supervisord': not a valid identifier
Error: failed to run torchrun --nproc_per_node=1 --nnodes=1 --node_rank=0 --master_addr=127.0.0.1 --master_port=29500 lora_finetune.py --pretrained /root/autodl-tmp/DeepSeeK-R1-7B --dataset /root/converted_data.json --quant 4 --lora_rank 32 --lora_alpha 64 --batch_size 8 --gradient_accumulation 2 --max_length 1024 --lr 1.5e-4 --warmup_steps 50 --num_epochs 3 --save_dir /root/autodl-tmp/DeepSeeK_lora --grad_ckpt --dtype bf16 on 127.0.0.1, is localhost: True, exception: Encountered a bad command exit code!

Command: 'cd /root/ColossalAI/applications/ColossalChat/examples/training_scripts && export ="/usr/bin/supervisord" SHELL="/bin/bash" NV_LIBCUBLAS_VERSION="12.4.5.8-1" NVIDIA_VISIBLE_DEVICES="GPU-866ac0d7-8995-0dd3-9bc5-6de16452ad15" NV_NVML_DEV_VERSION="12.4.127-1" NV_CUDNN_PACKAGE_NAME="libcudnn9-cuda-12" NV_LIBNCCL_DEV_PACKAGE="libnccl-dev=2.21.5-1+cuda12.4" CONDA_EXE="/root/miniconda3/bin/conda" NV_LIBNCCL_DEV_PACKAGE_VERSION="2.21.5-1" HOSTNAME="autodl-container-493b4c87d3-99a9c3d7" NVIDIA_REQUIRE_CUDA="cuda>=12.4 brand=tesla,driver>=470,driver<471 brand=unknown,driver>=470,driver<471 brand=nvidia,driver>=470,driver<471 brand=nvidiartx,driver>=470,driver<471 brand=geforce,driver>=470,driver<471 brand=geforcertx,driver>=470,driver<471 brand=quadro,driver>=470,driver<471 brand=quadrortx,driver>=470,driver<471 brand=titan,driver>=470,driver<471 brand=titanrtx,driver>=470,driver<471 brand=tesla,driver>=525,driver<526 brand=unknown,driver>=525,driver<526 brand=nvidia,driver>=525,driver<526 brand=nvidiartx,driver>=525,driver<526 brand=geforce,driver>=525,driver<526 brand=geforcertx,driver>=525,driver<526 brand=quadro,driver>=525,driver<526 brand=quadrortx,driver>=525,driver<526 brand=titan,driver>=525,driver<526 brand=titanrtx,driver>=525,driver<526 brand=tesla,driver>=535,driver<536 brand=unknown,driver>=535,driver<536 brand=nvidia,driver>=535,driver<536 brand=nvidiartx,driver>=535,driver<536 brand=geforce,driver>=535,driver<536 brand=geforcertx,driver>=535,driver<536 brand=quadro,driver>=535,driver<536 brand=quadrortx,driver>=535,driver<536 brand=titan,driver>=535,driver<536 brand=titanrtx,driver>=535,driver<536" NV_LIBCUBLAS_DEV_PACKAGE="libcublas-dev-12-4=12.4.5.8-1" NV_NVTX_VERSION="12.4.127-1" NV_CUDA_CUDART_DEV_VERSION="12.4.127-1" NV_LIBCUSPARSE_VERSION="12.3.1.170-1" NV_LIBNPP_VERSION="12.2.5.30-1" NCCL_VERSION="2.21.5-1" PWD="/root/ColossalAI/applications/ColossalChat/examples/training_scripts" AutoDLContainerUUID="493b4c87d3-99a9c3d7" CONDA_PREFIX="/root/miniconda3/envs/sft" NV_CUDNN_PACKAGE="libcudnn9-cuda-12=9.1.0.70-1" NVIDIA_DRIVER_CAPABILITIES="compute,utility,graphics,video" JUPYTER_SERVER_URL="http://autodl-container-493b4c87d3-99a9c3d7:8888/jupyter/" NV_NVPROF_DEV_PACKAGE="cuda-nvprof-12-4=12.4.127-1" NV_LIBNPP_PACKAGE="libnpp-12-4=12.2.5.30-1" NV_LIBNCCL_DEV_PACKAGE_NAME="libnccl-dev" TZ="Asia/Shanghai" NV_LIBCUBLAS_DEV_VERSION="12.4.5.8-1" NVIDIA_PRODUCT_NAME="CUDA" NV_LIBCUBLAS_DEV_PACKAGE_NAME="libcublas-dev-12-4" LINES="45" NV_CUDA_CUDART_VERSION="12.4.127-1" AutoDLServiceURL="https://u502097-87d3-99a9c3d7.nmb1.seetacloud.com:8443" HOME="/root" LS_COLORS="rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:.tar=01;31:.tgz=01;31:.arc=01;31:.arj=01;31:.taz=01;31:.lha=01;31:.lz4=01;31:.lzh=01;31:.lzma=01;31:.tlz=01;31:.txz=01;31:.tzo=01;31:.t7z=01;31:.zip=01;31:.z=01;31:.dz=01;31:.gz=01;31:.lrz=01;31:.lz=01;31:.lzo=01;31:.xz=01;31:.zst=01;31:.tzst=01;31:.bz2=01;31:.bz=01;31:.tbz=01;31:.tbz2=01;31:.tz=01;31:.deb=01;31:.rpm=01;31:.jar=01;31:.war=01;31:.ear=01;31:.sar=01;31:.rar=01;31:.alz=01;31:.ace=01;31:.zoo=01;31:.cpio=01;31:.7z=01;31:.rz=01;31:.cab=01;31:.wim=01;31:.swm=01;31:.dwm=01;31:.esd=01;31:.jpg=01;35:.jpeg=01;35:.mjpg=01;35:.mjpeg=01;35:.gif=01;35:.bmp=01;35:.pbm=01;35:.pgm=01;35:.ppm=01;35:.tga=01;35:.xbm=01;35:.xpm=01;35:.tif=01;35:.tiff=01;35:.png=01;35:.svg=01;35:.svgz=01;35:.mng=01;35:.pcx=01;35:.mov=01;35:.mpg=01;35:.mpeg=01;35:.m2v=01;35:.mkv=01;35:.webm=01;35:.webp=01;35:.ogm=01;35:.mp4=01;35:.m4v=01;35:.mp4v=01;35:.vob=01;35:.qt=01;35:.nuv=01;35:.wmv=01;35:.asf=01;35:.rm=01;35:.rmvb=01;35:.flc=01;35:.avi=01;35:.fli=01;35:.flv=01;35:.gl=01;35:.dl=01;35:.xcf=01;35:.xwd=01;35:.yuv=01;35:.cgm=01;35:.emf=01;35:.ogv=01;35:.ogx=01;35:.aac=00;36:.au=00;36:.flac=00;36:.m4a=00;36:.mid=00;36:.midi=00;36:.mka=00;36:.mp3=00;36:.mpc=00;36:.ogg=00;36:.ra=00;36:.wav=00;36:.oga=00;36:.opus=00;36:.spx=00;36:.xspf=00;36:" COLUMNS="176" AutoDLRegion="nm-B1" CUDA_VERSION="12.4.1" AgentHost="172.29.52.64" NV_LIBCUBLAS_PACKAGE="libcublas-12-4=12.4.5.8-1" NV_CUDA_NSIGHT_COMPUTE_DEV_PACKAGE="cuda-nsight-compute-12-4=12.4.1-1" CONDA_PROMPT_MODIFIER="(sft) " NV_LIBNPP_DEV_PACKAGE="libnpp-dev-12-4=12.2.5.30-1" NV_LIBCUBLAS_PACKAGE_NAME="libcublas-12-4" NV_LIBNPP_DEV_VERSION="12.2.5.30-1" JUPYTER_SERVER_ROOT="/root" TERM="xterm-256color" NV_LIBCUSPARSE_DEV_VERSION="12.3.1.170-1" LIBRARY_PATH="/usr/local/cuda/lib64/stubs" NV_CUDNN_VERSION="9.1.0.70-1" AutodlAutoPanelToken="jupyter-autodl-container-493b4c87d3-99a9c3d7-1f3f70c858d6c46d3975675baf8f3e103263f16190d504cfa848ca726f9077e18" CONDA_SHLVL="2" SHLVL="2" PYXTERM_DIMENSIONS="80x25" CUDA_INSTALL_DIR="/usr/local/cuda-12.4/" NV_CUDA_LIB_VERSION="12.4.1-1" NVARCH="x86_64" NV_CUDNN_PACKAGE_DEV="libcudnn9-dev-cuda-12=9.1.0.70-1" NV_CUDA_COMPAT_PACKAGE="cuda-compat-12-4" CONDA_PYTHON_EXE="/root/miniconda3/bin/python" NV_LIBNCCL_PACKAGE="libnccl2=2.21.5-1+cuda12.4" LD_LIBRARY_PATH="/usr/local/nvidia/lib:/usr/local/nvidia/lib64" LC_CTYPE="C.UTF-8" CONDA_DEFAULT_ENV="sft" NV_CUDA_NSIGHT_COMPUTE_VERSION="12.4.1-1" REQUESTS_CA_BUNDLE="/etc/ssl/certs/ca-certificates.crt" OMP_NUM_THREADS="16" NV_NVPROF_VERSION="12.4.127-1" CUDA_HOME="/usr/local/cuda-12.4/" PATH="/root/miniconda3/envs/sft/bin:/root/miniconda3/condabin:/root/miniconda3/bin:/usr/local/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin" NV_LIBNCCL_PACKAGE_NAME="libnccl2" NV_LIBNCCL_PACKAGE_VERSION="2.21.5-1" MKL_NUM_THREADS="16" CONDA_PREFIX_1="/root/miniconda3" DEBIAN_FRONTEND="noninteractive" OLDPWD="/root/ColossalAI" AutoDLDataCenter="neimengDC3" _="/root/miniconda3/envs/sft/bin/colossalai" CUDA_DEVICE_MAX_CONNECTIONS="1" && torchrun --nproc_per_node=1 --nnodes=1 --node_rank=0 --master_addr=127.0.0.1 --master_port=29500 lora_finetune.py --pretrained /root/autodl-tmp/DeepSeeK-R1-7B --dataset /root/converted_data.json --quant 4 --lora_rank 32 --lora_alpha 64 --batch_size 8 --gradient_accumulation 2 --max_length 1024 --lr 1.5e-4 --warmup_steps 50 --num_epochs 3 --save_dir /root/autodl-tmp/DeepSeeK_lora --grad_ckpt --dtype bf16'

@tanghl01 tanghl01 added the documentation Improvements or additions to documentation label Mar 2, 2025
@D1026
Copy link

D1026 commented Apr 7, 2025

have you sovled this, I also encount this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants