Description
I encountered an error when using the --trace option. The error message indicates the following:
/u/.conda/envs/mlora/lib/python3.12/site-packages/bitsandbytes/autograd/functions.py:322: UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantization
warnings.warn(f"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization")
Traceback (most recent call last):
File "/mLoRA/mlora_train.py", line 68, in
executor.execute()
File "/mLoRA/mlora/executor/executor.py", line 110, in execute
output = self.model.forward(data.model_data())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mLoRA/mlora/model/llm/model_llama.py", line 174, in forward
data = seq_layer.forward(data)
^^^^^^^^^^^^^^^^^^^^^^^
File "/mLoRA/mlora/model/llm/model_llama.py", line 138, in forward
return forward_func_dictmodule_name
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mLoRA/mlora/model/llm/model_llama.py", line 108, in decoder_forward
set_backward_tracepoint(output.grad_fn, "b_checkpoint")
File "/mLoRA/mlora/profiler/profiler.py", line 139, in set_backward_tracepoint
if TRACEPOINT_KEY in grad_fn.metadata():
^^^^^^^^^^^^^^^^^^
TypeError: 'dict' object is not callable
Generating '/tmp/nsys-report-4fe1.qdstrm'
I executed the command:
nsys profile -w true -t cuda,nvtx -s none -o test_report -f true -x true python mlora_train.py --base_model TinyLlama/TinyLlama-1.1B-Chat-v0.4 --device "cuda:0" --config /projects/bcrn/mLoRA/demo/lora/lora_case_1.yaml --trace
or simply added --trace after normal commands.
Could you please help me understand why this error is occurring? And could you help me with using trace? Thanks!