Issue with Trace Option Causing TypeError in mLoRA Training

I encountered an error when using the --trace option. The error message indicates the following:

> /u/.conda/envs/mlora/lib/python3.12/site-packages/bitsandbytes/autograd/_functions.py:322: UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantization
>   warnings.warn(f"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization")
> Traceback (most recent call last):
>   File "/mLoRA/mlora_train.py", line 68, in <module>
>     executor.execute()
>   File "/mLoRA/mlora/executor/executor.py", line 110, in execute
>     output = self.model_.forward(data.model_data())
>              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>   File "/mLoRA/mlora/model/llm/model_llama.py", line 174, in forward
>     data = seq_layer.forward(data)
>            ^^^^^^^^^^^^^^^^^^^^^^^
>   File "/mLoRA/mlora/model/llm/model_llama.py", line 138, in forward
>     return forward_func_dict[module_name]()
>            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>   File "/mLoRA/mlora/model/llm/model_llama.py", line 108, in decoder_forward
>     set_backward_tracepoint(output.grad_fn, "b_checkpoint")
>   File "/mLoRA/mlora/profiler/profiler.py", line 139, in set_backward_tracepoint
>     if TRACEPOINT_KEY in grad_fn.metadata():
>                          ^^^^^^^^^^^^^^^^^^
> TypeError: 'dict' object is not callable
> Generating '/tmp/nsys-report-4fe1.qdstrm'

I executed the command:

`nsys profile -w true -t cuda,nvtx -s none -o test_report -f true -x true python mlora_train.py --base_model TinyLlama/TinyLlama-1.1B-Chat-v0.4 --device "cuda:0" --config /projects/bcrn/mLoRA/demo/lora/lora_case_1.yaml --trace`
or simply added --trace after normal commands.

Could you please help me understand why this error is occurring? And could you help me with using trace? Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Issue with Trace Option Causing TypeError in mLoRA Training #268

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue with Trace Option Causing TypeError in mLoRA Training #268

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions