Skip to content

Getting AttributeError: 'Gemma3ForConditionalGeneration' object has no attribute 'vocab_size when using CCE #2874

@sanchit-ahuja

Description

@sanchit-ahuja

Please check that this issue hasn't been reported before.

  • I searched previous Bug Reports didn't find any similar reports.

Expected Behavior

Training begins

Current behaviour

I have 4 L4 GPUs with 24GB of VRAM on the same node. I am trying to use CCE because without that I am getting OOM on using deepspeed in zero1. However, I am getting this error:
My transformers version is 4.52.3 and I have installed the CCE upstream for axolotl
AttributeError: 'Gemma3ForConditionalGeneration' object has no attribute 'vocab_size'

Steps to reproduce

I am sharing the config file, if we run it, we observe this error for a Gemma Fine-tuned model.

Config yaml

# A list of one or more datasets to finetune the model with
base_model: sarvamai/sarvam-translate

plugins:
- axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin

strict: true
chat_template: gemma3
datasets:
- path: /home/random/sanchit/data/train/train_en_to_kn_samples.jsonl
  type: chat_template
  field_messages: messages
  message_property_mappings:
    role: role
    content: value
- path: /home/random/sanchit/data/train/train_kn_to_en_samples.jsonl
  type: chat_template
  field_messages: messages
  message_property_mappings:
    role: role
    content: value


device: cuda
# Seed for reproducibility
seed: 42

bf16: true
dataset_processes: 45
val_set_size: 0.02
sequence_len: 8192
sample_packing: true
eval_sample_packing: false
pad_to_sequence_len: true
# Whether to use gradient checkpointing. Available options are: true, false, 'offload',
# 'offload_disk'.
# https://huggingface.co/docs/transformers/v4.18.0/en/performance#gradient-checkpointing
gradient_checkpointing: true

# The maximum length of an input to train with, this should typically be less than 2048
# as most models have a token/context limit of 2048

# Whether to use flash attention patch https://github.com/Dao-AILab/flash-attention
flash_attention: true

# How much of the dataset to set aside as evaluation. 1 = 100%, 0.50 = 50%, etc. 0 for
# no eval.
# val_set_size: 0.02


# Add or change special tokens. If you add tokens here, you don't need to add them to
# the `tokens` list.
eot_tokens:
  - <end_of_turn>

train_on_inputs: false
# Maximum number of iterations to train for. It precedes num_epochs which means that if
# both are set, num_epochs will not be guaranteed. e.g., when 1 epoch is 1000 steps =>
# `num_epochs: 2` and `max_steps: 100` will train for 100 steps
# --- Core training settings ---
num_epochs: 1
gradient_accumulation_steps: 2
micro_batch_size: 1
# --- Warmup settings ---
warmup_ratio: 0.03  # 3% of total steps (~low dataset, faster convergence)

# --- Evaluation settings ---
eval_strategy: steps  

# --- Saving settings ---
save_strategy: epoch
saves_per_epoch: 5

# --- Logging ---
logging_steps: 50  # Log every 50 steps


# --- Save format ---
save_safetensors: True

# --- Logging tools ---
use_tensorboard: true
# low_cpu_mem_usage: true
wandb_project: "translation-en-kn-ift"
wandb_entity: "translation-adalat-ai"
wandb_name: "lora_linear_run_l4"

# --- Adapter & LoRA settings ---
adapter: lora
lora_r: 16
lora_alpha: 32
lora_target_linear: true

# --- Learning rate & batch size ---
learning_rate: 0.0002
weight_decay: 0.01

# --- Optimizer & scheduler ---
optimizer: adamw_torch_fused
lr_scheduler: linear

# --- Output & Hub settings ---
output_dir: "./model-out/exp1"
dataset_prepared_path: last_run_prepared
trust_remote_code: True
# deepspeed: deepspeed_configs/zero3_bf16_cpuoffload_all.json
deepspeed: deepspeed_configs/zero1.json

Possible solution

No response

Which Operating Systems are you using?

  • Linux
  • macOS
  • Windows

Python Version

3.11

axolotl branch-commit

main/5a961ec

Acknowledgements

  • My issue title is concise, descriptive, and in title casing.
  • I have searched the existing issues to make sure this bug has not been reported yet.
  • I am using the latest version of axolotl.
  • I have provided enough information for the maintainers to reproduce and diagnose the issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions