Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lora and Dora finetuning produces identical results #2250

Open
AndrewMead10 opened this issue Jan 10, 2025 · 1 comment
Open

Lora and Dora finetuning produces identical results #2250

AndrewMead10 opened this issue Jan 10, 2025 · 1 comment
Labels
bug Something isn't working high-priority

Comments

@AndrewMead10
Copy link

I was trying to compare lora, dora, and full finetuning on llama 1B, but i found that lora and dora finetuning produced identical results. I am using the orca 10k dataset, like they did in the answer.ai post comparing the 2 methods.

here is the wandb report for the runs

here is my config file, the only thing that i changed between runs was the use_dora field from true to false. The command i ran was tune run lora_finetune_single_device --config benchmark_methods/llama_3_2_1b_lora_adam.yaml

I am using a 3090 gpu.

# Config for single device LoRA finetuning in lora_finetune_single_device.py
# using a Llama3.2 1B Instruct model
#
# This config assumes that you've run the following command before launching
# this run:
#   tune download meta-llama/Llama-3.2-1B-Instruct --output-dir ./tmp/Llama-3.2-1B-Instruct --ignore-patterns "original/consolidated.00.pth"
#
# To launch on a single device, run the following command from root:
#   tune run lora_finetune_single_device --config llama3_2/1B_lora_single_device
#
# You can add specific overrides through the command line. For example
# to override the checkpointer directory while launching training
# you can run:
#   tune run lora_finetune_single_device --config llama3_2/1B_lora_single_device checkpointer.checkpoint_dir=<YOUR_CHECKPOINT_DIR>
#
# This config works only for training on single device.

output_dir: ./tmp/torchtune/llama3_2_1B/lora_single_device # /tmp may be deleted by your system. Change it to your preference.

# Model Arguments
model:
  _component_: torchtune.models.llama3_2.lora_llama3_2_1b
  lora_attn_modules: ['q_proj', 'v_proj', 'output_proj']
  apply_lora_to_mlp: True
  lora_rank: 32  # higher increases accuracy and memory
  lora_alpha: 64  # usually alpha=2*rank
  lora_dropout: 0.0
  use_dora: True

# Tokenizer
tokenizer:
  _component_: torchtune.models.llama3.llama3_tokenizer
  path: ./tmp/Llama-3.2-1B/original/tokenizer.model
  max_seq_len: 2048

checkpointer:
  _component_: torchtune.training.FullModelHFCheckpointer
  checkpoint_dir: ./tmp/Llama-3.2-1B/
  checkpoint_files: [
     model.safetensors
  ]
  recipe_checkpoint: null
  output_dir: ${output_dir}
  model_type: LLAMA3_2
resume_from_checkpoint: False
save_adapter_weights_only: False

# Dataset and Sampler
dataset:
  _component_: torchtune.datasets.chat_dataset
  packed: True  # True increases speed
  source: qnguyen3/orca_math_10k
  conversation_column: conversations
  conversation_style: sharegpt
  split: train
seed: 42
shuffle: True
batch_size: 4

# Optimizer and Scheduler
optimizer:
  _component_: torch.optim.AdamW
  fused: True
  weight_decay: 0.01
  lr: 1e-4
lr_scheduler:
  _component_: torchtune.training.lr_schedulers.get_cosine_schedule_with_warmup
  num_warmup_steps: 100

loss:
  _component_: torchtune.modules.loss.CEWithChunkedOutputLoss

# Training
epochs: 1
max_steps_per_epoch: null
gradient_accumulation_steps: 2  # Use to increase effective batch size
compile: True  # torch.compile the model + loss, True increases speed + decreases memory

# Logging
metric_logger:
  _component_: torchtune.training.metric_logging.WandBLogger
  project: benchmark_torchtune
log_every_n_steps: 1
log_peak_memory_stats: True

# Environment
device: cuda
dtype: bf16

# Activations Memory
enable_activation_checkpointing: False  # True reduces memory
enable_activation_offloading: False  # True reduces memory


# Profiler (disabled)
profiler:
  _component_: torchtune.training.setup_torch_profiler
  enabled: False

  #Output directory of trace artifacts
  output_dir: ${output_dir}/profiling_outputs

  #`torch.profiler.ProfilerActivity` types to trace
  cpu: True
  cuda: True

  #trace options passed to `torch.profiler.profile`
  profile_memory: False
  with_stack: False
  record_shapes: True
  with_flops: False

  # `torch.profiler.schedule` options:
  # wait_steps -> wait, warmup_steps -> warmup, active_steps -> active, num_cycles -> repeat
  wait_steps: 5
  warmup_steps: 3
  active_steps: 2
  num_cycles: 1
@ebsmothers
Copy link
Contributor

Hi @AndrewMead10 thanks for creating the issue. I am able to repro this, dug in a bit and can see that the DoRA magnitudes are not being updated across iterations. I need to do some further investigation, will keep you posted on my findings. But I think this should be pretty high-priority for us to figure out. Also tagging @SLR722 and @calvinpelletier who are familiar with this code to possibly take a look.

@ebsmothers ebsmothers added bug Something isn't working high-priority labels Jan 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working high-priority
Projects
None yet
Development

No branches or pull requests

2 participants