I want to use 8 4090(24G) to quantize the W4A4 for llama-7b, but it will have this error.
python main.py
--model ./llama-7b
--epochs 1 --output_dir ./log/llama-7b-w4a4
--eval_ppl --wbits 4 --abits 4 --lwc --let --multigpu
--tasks piqa,arc_easy,arc_challenge,boolq,hellaswag,winogrande
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:7 and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select)