You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was trying to compare lora, dora, and full finetuning on llama 1B, but i found that lora and dora finetuning produced identical results. I am using the orca 10k dataset, like they did in the answer.ai post comparing the 2 methods.
here is my config file, the only thing that i changed between runs was the use_dora field from true to false. The command i ran was tune run lora_finetune_single_device --config benchmark_methods/llama_3_2_1b_lora_adam.yaml
I am using a 3090 gpu.
# Config for single device LoRA finetuning in lora_finetune_single_device.py# using a Llama3.2 1B Instruct model## This config assumes that you've run the following command before launching# this run:# tune download meta-llama/Llama-3.2-1B-Instruct --output-dir ./tmp/Llama-3.2-1B-Instruct --ignore-patterns "original/consolidated.00.pth"## To launch on a single device, run the following command from root:# tune run lora_finetune_single_device --config llama3_2/1B_lora_single_device## You can add specific overrides through the command line. For example# to override the checkpointer directory while launching training# you can run:# tune run lora_finetune_single_device --config llama3_2/1B_lora_single_device checkpointer.checkpoint_dir=<YOUR_CHECKPOINT_DIR>## This config works only for training on single device.output_dir: ./tmp/torchtune/llama3_2_1B/lora_single_device # /tmp may be deleted by your system. Change it to your preference.# Model Argumentsmodel:
_component_: torchtune.models.llama3_2.lora_llama3_2_1blora_attn_modules: ['q_proj', 'v_proj', 'output_proj']apply_lora_to_mlp: Truelora_rank: 32# higher increases accuracy and memorylora_alpha: 64# usually alpha=2*ranklora_dropout: 0.0use_dora: True# Tokenizertokenizer:
_component_: torchtune.models.llama3.llama3_tokenizerpath: ./tmp/Llama-3.2-1B/original/tokenizer.modelmax_seq_len: 2048checkpointer:
_component_: torchtune.training.FullModelHFCheckpointercheckpoint_dir: ./tmp/Llama-3.2-1B/checkpoint_files: [model.safetensors]recipe_checkpoint: nulloutput_dir: ${output_dir}model_type: LLAMA3_2resume_from_checkpoint: Falsesave_adapter_weights_only: False# Dataset and Samplerdataset:
_component_: torchtune.datasets.chat_datasetpacked: True # True increases speedsource: qnguyen3/orca_math_10kconversation_column: conversationsconversation_style: sharegptsplit: trainseed: 42shuffle: Truebatch_size: 4# Optimizer and Scheduleroptimizer:
_component_: torch.optim.AdamWfused: Trueweight_decay: 0.01lr: 1e-4lr_scheduler:
_component_: torchtune.training.lr_schedulers.get_cosine_schedule_with_warmupnum_warmup_steps: 100loss:
_component_: torchtune.modules.loss.CEWithChunkedOutputLoss# Trainingepochs: 1max_steps_per_epoch: nullgradient_accumulation_steps: 2# Use to increase effective batch sizecompile: True # torch.compile the model + loss, True increases speed + decreases memory# Loggingmetric_logger:
_component_: torchtune.training.metric_logging.WandBLoggerproject: benchmark_torchtunelog_every_n_steps: 1log_peak_memory_stats: True# Environmentdevice: cudadtype: bf16# Activations Memoryenable_activation_checkpointing: False # True reduces memoryenable_activation_offloading: False # True reduces memory# Profiler (disabled)profiler:
_component_: torchtune.training.setup_torch_profilerenabled: False#Output directory of trace artifactsoutput_dir: ${output_dir}/profiling_outputs#`torch.profiler.ProfilerActivity` types to tracecpu: Truecuda: True#trace options passed to `torch.profiler.profile`profile_memory: Falsewith_stack: Falserecord_shapes: Truewith_flops: False# `torch.profiler.schedule` options:# wait_steps -> wait, warmup_steps -> warmup, active_steps -> active, num_cycles -> repeatwait_steps: 5warmup_steps: 3active_steps: 2num_cycles: 1
The text was updated successfully, but these errors were encountered:
Hi @AndrewMead10 thanks for creating the issue. I am able to repro this, dug in a bit and can see that the DoRA magnitudes are not being updated across iterations. I need to do some further investigation, will keep you posted on my findings. But I think this should be pretty high-priority for us to figure out. Also tagging @SLR722 and @calvinpelletier who are familiar with this code to possibly take a look.
I was trying to compare lora, dora, and full finetuning on llama 1B, but i found that lora and dora finetuning produced identical results. I am using the orca 10k dataset, like they did in the answer.ai post comparing the 2 methods.
here is the wandb report for the runs
here is my config file, the only thing that i changed between runs was the use_dora field from true to false. The command i ran was
tune run lora_finetune_single_device --config benchmark_methods/llama_3_2_1b_lora_adam.yaml
I am using a 3090 gpu.
The text was updated successfully, but these errors were encountered: