[DOTry to use lora_finetune.py directly and get an error /bin/bash: line 1: export: `=/usr/bin/supervisord': not a valid identifier #6235

tanghl01 · 2025-03-02T07:50:53Z

📚 The doc issue

git clone https://github.com/hpcaitech/ColossalAI.git
cd ColossalAI

install dependency

pip install -r requirements/requirements.txt

install colossalai

BUILD_EXT=1 pip install .
export CUDA_VISIBLE_DEVICES=0
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
export CUDA_INSTALL_DIR=/usr/local/cuda-12.4/
export CUDA_HOME=/usr/local/cuda-12.4/

colossalai check -i

Installation Report

------------ Environment ------------
Colossal-AI version: 0.4.8
PyTorch version: 2.5.1
System CUDA version: 12.4
CUDA version required by PyTorch: 12.4

Note:

The table above checks the versions of the libraries/tools in the current environment
If the System CUDA version is N/A, you can set the CUDA_HOME environment variable to locate it
If the CUDA version required by PyTorch is N/A, you probably did not install a CUDA-compatible PyTorch. This value is give by torch.version.cuda and you can go to https://pytorch.org/get-started/locally/ to download the correct version.

------------ CUDA Extensions AOT Compilation ------------
Found AOT CUDA Extension: ✓
PyTorch version used for AOT compilation: N/A
CUDA version used for AOT compilation: N/A

Note:

AOT (ahead-of-time) compilation of the CUDA kernels occurs during installation when the environment variable BUILD_EXT=1 is set
If AOT compilation is not enabled, stay calm as the CUDA kernels can still be built during runtime

------------ Compatibility ------------
PyTorch version match: N/A
System and PyTorch CUDA version match: ✓
System and Colossal-AI CUDA version match: N/A

Note:

The table above checks the version compatibility of the libraries/tools in the current environment
- PyTorch version mismatch: whether the PyTorch version in the current environment is compatible with the PyTorch version used for AOT compilation
- System and PyTorch CUDA version match: whether the CUDA version in the current environment is compatible with the CUDA version required by PyTorch
- System and Colossal-AI CUDA version match: whether the CUDA version in the current environment is compatible with the CUDA version used for AOT compilation

#colossalai run --nproc_per_node 1 lora_finetune.py --pretrained "/root/autodl-tmp/DeepSeeK-R1-7B" --dataset "/root/converted_data.json" --quant 4 --lora_rank 32 --lora_alpha 64 --batch_size 8 --gradient_accumulation 2 --max_length 1024 --lr 1.5e-4 --warmup_steps 50 --num_epochs 3 --save_dir "/root/autodl-tmp/DeepSeeK_lora" --grad_ckpt --dtype bf16

/bin/bash: line 1: export: `=/usr/bin/supervisord': not a valid identifier
Error: failed to run torchrun --nproc_per_node=1 --nnodes=1 --node_rank=0 --master_addr=127.0.0.1 --master_port=29500 lora_finetune.py --pretrained /root/autodl-tmp/DeepSeeK-R1-7B --dataset /root/converted_data.json --quant 4 --lora_rank 32 --lora_alpha 64 --batch_size 8 --gradient_accumulation 2 --max_length 1024 --lr 1.5e-4 --warmup_steps 50 --num_epochs 3 --save_dir /root/autodl-tmp/DeepSeeK_lora --grad_ckpt --dtype bf16 on 127.0.0.1, is localhost: True, exception: Encountered a bad command exit code!

Command: 'cd /root/ColossalAI/applications/ColossalChat/examples/training_scripts && export ="/usr/bin/supervisord" SHELL="/bin/bash" NV_LIBCUBLAS_VERSION="12.4.5.8-1" NVIDIA_VISIBLE_DEVICES="GPU-866ac0d7-8995-0dd3-9bc5-6de16452ad15" NV_NVML_DEV_VERSION="12.4.127-1" NV_CUDNN_PACKAGE_NAME="libcudnn9-cuda-12" NV_LIBNCCL_DEV_PACKAGE="libnccl-dev=2.21.5-1+cuda12.4" CONDA_EXE="/root/miniconda3/bin/conda" NV_LIBNCCL_DEV_PACKAGE_VERSION="2.21.5-1" HOSTNAME="autodl-container-493b4c87d3-99a9c3d7" NVIDIA_REQUIRE_CUDA="cuda>=12.4 brand=tesla,driver>=470,driver<471 brand=unknown,driver>=470,driver<471 brand=nvidia,driver>=470,driver<471 brand=nvidiartx,driver>=470,driver<471 brand=geforce,driver>=470,driver<471 brand=geforcertx,driver>=470,driver<471 brand=quadro,driver>=470,driver<471 brand=quadrortx,driver>=470,driver<471 brand=titan,driver>=470,driver<471 brand=titanrtx,driver>=470,driver<471 brand=tesla,driver>=525,driver<526 brand=unknown,driver>=525,driver<526 brand=nvidia,driver>=525,driver<526 brand=nvidiartx,driver>=525,driver<526 brand=geforce,driver>=525,driver<526 brand=geforcertx,driver>=525,driver<526 brand=quadro,driver>=525,driver<526 brand=quadrortx,driver>=525,driver<526 brand=titan,driver>=525,driver<526 brand=titanrtx,driver>=525,driver<526 brand=tesla,driver>=535,driver<536 brand=unknown,driver>=535,driver<536 brand=nvidia,driver>=535,driver<536 brand=nvidiartx,driver>=535,driver<536 brand=geforce,driver>=535,driver<536 brand=geforcertx,driver>=535,driver<536 brand=quadro,driver>=535,driver<536 brand=quadrortx,driver>=535,driver<536 brand=titan,driver>=535,driver<536 brand=titanrtx,driver>=535,driver<536" NV_LIBCUBLAS_DEV_PACKAGE="libcublas-dev-12-4=12.4.5.8-1" NV_NVTX_VERSION="12.4.127-1" NV_CUDA_CUDART_DEV_VERSION="12.4.127-1" NV_LIBCUSPARSE_VERSION="12.3.1.170-1" NV_LIBNPP_VERSION="12.2.5.30-1" NCCL_VERSION="2.21.5-1" PWD="/root/ColossalAI/applications/ColossalChat/examples/training_scripts" AutoDLContainerUUID="493b4c87d3-99a9c3d7" CONDA_PREFIX="/root/miniconda3/envs/sft" NV_CUDNN_PACKAGE="libcudnn9-cuda-12=9.1.0.70-1" NVIDIA_DRIVER_CAPABILITIES="compute,utility,graphics,video" JUPYTER_SERVER_URL="http://autodl-container-493b4c87d3-99a9c3d7:8888/jupyter/" NV_NVPROF_DEV_PACKAGE="cuda-nvprof-12-4=12.4.127-1" NV_LIBNPP_PACKAGE="libnpp-12-4=12.2.5.30-1" NV_LIBNCCL_DEV_PACKAGE_NAME="libnccl-dev" TZ="Asia/Shanghai" NV_LIBCUBLAS_DEV_VERSION="12.4.5.8-1" NVIDIA_PRODUCT_NAME="CUDA" NV_LIBCUBLAS_DEV_PACKAGE_NAME="libcublas-dev-12-4" LINES="45" NV_CUDA_CUDART_VERSION="12.4.127-1" AutoDLServiceURL="https://u502097-87d3-99a9c3d7.nmb1.seetacloud.com:8443" HOME="/root" LS_COLORS="rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:.tar=01;31:.tgz=01;31:.arc=01;31:.arj=01;31:.taz=01;31:.lha=01;31:.lz4=01;31:.lzh=01;31:.lzma=01;31:.tlz=01;31:.txz=01;31:.tzo=01;31:.t7z=01;31:.zip=01;31:.z=01;31:.dz=01;31:.gz=01;31:.lrz=01;31:.lz=01;31:.lzo=01;31:.xz=01;31:.zst=01;31:.tzst=01;31:.bz2=01;31:.bz=01;31:.tbz=01;31:.tbz2=01;31:.tz=01;31:.deb=01;31:.rpm=01;31:.jar=01;31:.war=01;31:.ear=01;31:.sar=01;31:.rar=01;31:.alz=01;31:.ace=01;31:.zoo=01;31:.cpio=01;31:.7z=01;31:.rz=01;31:.cab=01;31:.wim=01;31:.swm=01;31:.dwm=01;31:.esd=01;31:.jpg=01;35:.jpeg=01;35:.mjpg=01;35:.mjpeg=01;35:.gif=01;35:.bmp=01;35:.pbm=01;35:.pgm=01;35:.ppm=01;35:.tga=01;35:.xbm=01;35:.xpm=01;35:.tif=01;35:.tiff=01;35:.png=01;35:.svg=01;35:.svgz=01;35:.mng=01;35:.pcx=01;35:.mov=01;35:.mpg=01;35:.mpeg=01;35:.m2v=01;35:.mkv=01;35:.webm=01;35:.webp=01;35:.ogm=01;35:.mp4=01;35:.m4v=01;35:.mp4v=01;35:.vob=01;35:.qt=01;35:.nuv=01;35:.wmv=01;35:.asf=01;35:.rm=01;35:.rmvb=01;35:.flc=01;35:.avi=01;35:.fli=01;35:.flv=01;35:.gl=01;35:.dl=01;35:.xcf=01;35:.xwd=01;35:.yuv=01;35:.cgm=01;35:.emf=01;35:.ogv=01;35:.ogx=01;35:.aac=00;36:.au=00;36:.flac=00;36:.m4a=00;36:.mid=00;36:.midi=00;36:.mka=00;36:.mp3=00;36:.mpc=00;36:.ogg=00;36:.ra=00;36:.wav=00;36:.oga=00;36:.opus=00;36:.spx=00;36:.xspf=00;36:" COLUMNS="176" AutoDLRegion="nm-B1" CUDA_VERSION="12.4.1" AgentHost="172.29.52.64" NV_LIBCUBLAS_PACKAGE="libcublas-12-4=12.4.5.8-1" NV_CUDA_NSIGHT_COMPUTE_DEV_PACKAGE="cuda-nsight-compute-12-4=12.4.1-1" CONDA_PROMPT_MODIFIER="(sft) " NV_LIBNPP_DEV_PACKAGE="libnpp-dev-12-4=12.2.5.30-1" NV_LIBCUBLAS_PACKAGE_NAME="libcublas-12-4" NV_LIBNPP_DEV_VERSION="12.2.5.30-1" JUPYTER_SERVER_ROOT="/root" TERM="xterm-256color" NV_LIBCUSPARSE_DEV_VERSION="12.3.1.170-1" LIBRARY_PATH="/usr/local/cuda/lib64/stubs" NV_CUDNN_VERSION="9.1.0.70-1" AutodlAutoPanelToken="jupyter-autodl-container-493b4c87d3-99a9c3d7-1f3f70c858d6c46d3975675baf8f3e103263f16190d504cfa848ca726f9077e18" CONDA_SHLVL="2" SHLVL="2" PYXTERM_DIMENSIONS="80x25" CUDA_INSTALL_DIR="/usr/local/cuda-12.4/" NV_CUDA_LIB_VERSION="12.4.1-1" NVARCH="x86_64" NV_CUDNN_PACKAGE_DEV="libcudnn9-dev-cuda-12=9.1.0.70-1" NV_CUDA_COMPAT_PACKAGE="cuda-compat-12-4" CONDA_PYTHON_EXE="/root/miniconda3/bin/python" NV_LIBNCCL_PACKAGE="libnccl2=2.21.5-1+cuda12.4" LD_LIBRARY_PATH="/usr/local/nvidia/lib:/usr/local/nvidia/lib64" LC_CTYPE="C.UTF-8" CONDA_DEFAULT_ENV="sft" NV_CUDA_NSIGHT_COMPUTE_VERSION="12.4.1-1" REQUESTS_CA_BUNDLE="/etc/ssl/certs/ca-certificates.crt" OMP_NUM_THREADS="16" NV_NVPROF_VERSION="12.4.127-1" CUDA_HOME="/usr/local/cuda-12.4/" PATH="/root/miniconda3/envs/sft/bin:/root/miniconda3/condabin:/root/miniconda3/bin:/usr/local/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin" NV_LIBNCCL_PACKAGE_NAME="libnccl2" NV_LIBNCCL_PACKAGE_VERSION="2.21.5-1" MKL_NUM_THREADS="16" CONDA_PREFIX_1="/root/miniconda3" DEBIAN_FRONTEND="noninteractive" OLDPWD="/root/ColossalAI" AutoDLDataCenter="neimengDC3" _="/root/miniconda3/envs/sft/bin/colossalai" CUDA_DEVICE_MAX_CONNECTIONS="1" && torchrun --nproc_per_node=1 --nnodes=1 --node_rank=0 --master_addr=127.0.0.1 --master_port=29500 lora_finetune.py --pretrained /root/autodl-tmp/DeepSeeK-R1-7B --dataset /root/converted_data.json --quant 4 --lora_rank 32 --lora_alpha 64 --batch_size 8 --gradient_accumulation 2 --max_length 1024 --lr 1.5e-4 --warmup_steps 50 --num_epochs 3 --save_dir /root/autodl-tmp/DeepSeeK_lora --grad_ckpt --dtype bf16'

D1026 · 2025-04-07T09:28:08Z

have you sovled this, I also encount this

tanghl01 added the documentation Improvements or additions to documentation label Mar 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DOTry to use lora_finetune.py directly and get an error /bin/bash: line 1: export: `=/usr/bin/supervisord': not a valid identifier #6235

[DOTry to use lora_finetune.py directly and get an error /bin/bash: line 1: export: `=/usr/bin/supervisord': not a valid identifier #6235

tanghl01 commented Mar 2, 2025

D1026 commented Apr 7, 2025

[DOTry to use lora_finetune.py directly and get an error /bin/bash: line 1: export: `=/usr/bin/supervisord': not a valid identifier #6235

[DOTry to use lora_finetune.py directly and get an error /bin/bash: line 1: export: `=/usr/bin/supervisord': not a valid identifier #6235

Comments

tanghl01 commented Mar 2, 2025

📚 The doc issue

install dependency

install colossalai

colossalai check -i

Installation Report

D1026 commented Apr 7, 2025