-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Please check that this issue hasn't been reported before.
- I searched previous Bug Reports didn't find any similar reports.
Expected Behavior
Training begins
Current behaviour
I have 4 L4 GPUs with 24GB of VRAM on the same node. I am trying to use CCE because without that I am getting OOM on using deepspeed in zero1. However, I am getting this error:
My transformers version is 4.52.3 and I have installed the CCE upstream for axolotl
AttributeError: 'Gemma3ForConditionalGeneration' object has no attribute 'vocab_size'
Steps to reproduce
I am sharing the config file, if we run it, we observe this error for a Gemma Fine-tuned model.
Config yaml
# A list of one or more datasets to finetune the model with
base_model: sarvamai/sarvam-translate
plugins:
- axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin
strict: true
chat_template: gemma3
datasets:
- path: /home/random/sanchit/data/train/train_en_to_kn_samples.jsonl
type: chat_template
field_messages: messages
message_property_mappings:
role: role
content: value
- path: /home/random/sanchit/data/train/train_kn_to_en_samples.jsonl
type: chat_template
field_messages: messages
message_property_mappings:
role: role
content: value
device: cuda
# Seed for reproducibility
seed: 42
bf16: true
dataset_processes: 45
val_set_size: 0.02
sequence_len: 8192
sample_packing: true
eval_sample_packing: false
pad_to_sequence_len: true
# Whether to use gradient checkpointing. Available options are: true, false, 'offload',
# 'offload_disk'.
# https://huggingface.co/docs/transformers/v4.18.0/en/performance#gradient-checkpointing
gradient_checkpointing: true
# The maximum length of an input to train with, this should typically be less than 2048
# as most models have a token/context limit of 2048
# Whether to use flash attention patch https://github.com/Dao-AILab/flash-attention
flash_attention: true
# How much of the dataset to set aside as evaluation. 1 = 100%, 0.50 = 50%, etc. 0 for
# no eval.
# val_set_size: 0.02
# Add or change special tokens. If you add tokens here, you don't need to add them to
# the `tokens` list.
eot_tokens:
- <end_of_turn>
train_on_inputs: false
# Maximum number of iterations to train for. It precedes num_epochs which means that if
# both are set, num_epochs will not be guaranteed. e.g., when 1 epoch is 1000 steps =>
# `num_epochs: 2` and `max_steps: 100` will train for 100 steps
# --- Core training settings ---
num_epochs: 1
gradient_accumulation_steps: 2
micro_batch_size: 1
# --- Warmup settings ---
warmup_ratio: 0.03 # 3% of total steps (~low dataset, faster convergence)
# --- Evaluation settings ---
eval_strategy: steps
# --- Saving settings ---
save_strategy: epoch
saves_per_epoch: 5
# --- Logging ---
logging_steps: 50 # Log every 50 steps
# --- Save format ---
save_safetensors: True
# --- Logging tools ---
use_tensorboard: true
# low_cpu_mem_usage: true
wandb_project: "translation-en-kn-ift"
wandb_entity: "translation-adalat-ai"
wandb_name: "lora_linear_run_l4"
# --- Adapter & LoRA settings ---
adapter: lora
lora_r: 16
lora_alpha: 32
lora_target_linear: true
# --- Learning rate & batch size ---
learning_rate: 0.0002
weight_decay: 0.01
# --- Optimizer & scheduler ---
optimizer: adamw_torch_fused
lr_scheduler: linear
# --- Output & Hub settings ---
output_dir: "./model-out/exp1"
dataset_prepared_path: last_run_prepared
trust_remote_code: True
# deepspeed: deepspeed_configs/zero3_bf16_cpuoffload_all.json
deepspeed: deepspeed_configs/zero1.jsonPossible solution
No response
Which Operating Systems are you using?
- Linux
- macOS
- Windows
Python Version
3.11
axolotl branch-commit
main/5a961ec
Acknowledgements
- My issue title is concise, descriptive, and in title casing.
- I have searched the existing issues to make sure this bug has not been reported yet.
- I am using the latest version of axolotl.
- I have provided enough information for the maintainers to reproduce and diagnose the issue.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working