[help] Weird loss qwen3-vl-8b #9341

Damon-GSY · 2025-10-24T02:47:43Z

Damon-GSY
Oct 24, 2025

PROMPT_TEMPLATE="qwen3_vl_nothink"
MODEL_NAME="Qwen/Qwen3-VL-8B-Instruct"

args="--stage sft \
    --model_name_or_path=$MODEL_NAME \
    --do_train \
    --template=${PROMPT_TEMPLATE} \
    --finetuning_type full \
    --output_dir=local/tmp/ckpt_save_path/ \
    --overwrite_cache \
    --per_device_train_batch_size 1 \
    --gradient_accumulation_steps 4 \
    --num_train_epochs 2 \
    --lr_scheduler_type cosine \
    --logging_steps 10 \
    --save_steps 200 \
    --learning_rate=$LR \
    --cutoff_len=2048 \
    --preprocessing_num_workers=8 \
    --dataloader_num_workers=8 \
    --plot_loss \
    --report_to=wandb \
    --deepspeed=scripts/ds_zero3_cpuoffload.json \
    --flash_attn fa2 \
    --dataset_name=seasonality_combined_alpaca\
    --prompt=instruction \
    --query=input \
    --response=output \
    --streaming \
    --max_steps=12000 \
    --image=image \
    --image_resolution 12000 \
    --gradient_checkpointing \
    --bf16"

Used 8 * H100.
Recently I tried to do SFT on qwen3-vl-8b. I prepare a dataset(200,000 data, around 160,000 task specific data and 40,000 sharegpt4v data)

then I come across this loss

I’m not sure why, between steps 2810 and 2820, the loss suddenly drops from 0.63 to 0.45.
I also don’t understand why the loss curve shows a distinct step-like drop pattern.

I have verified that the training data is being shuffled properly, and one epoch takes roughly 6,000 steps.

hiyouga · 2025-10-24T06:11:11Z

hiyouga
Oct 24, 2025
Maintainer

Because you are using streaming, the dataset was not fully shuffled. You can shuffle the data manually before training

4 replies

Damon-GSY Oct 24, 2025
Author

I have double checked that the data is fully shuffled before I run the training scripts

hiyouga Oct 24, 2025
Maintainer

Can you print the data example from the 2 different steps?

Damon-GSY Oct 24, 2025
Author

I don't see if the wandb correctly log the trace. Any there any arguments of LLaMA-Factory to set this up and also configure the validation set?

hiyouga Oct 24, 2025
Maintainer

You can modify the codebase to print the data example

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[help] Weird loss qwen3-vl-8b #9341

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 4 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[help] Weird loss qwen3-vl-8b #9341

Uh oh!

Uh oh!

Damon-GSY Oct 24, 2025

Replies: 1 comment · 4 replies

Uh oh!

Uh oh!

hiyouga Oct 24, 2025 Maintainer

Uh oh!

Damon-GSY Oct 24, 2025 Author

Uh oh!

hiyouga Oct 24, 2025 Maintainer

Uh oh!

Uh oh!

Damon-GSY Oct 24, 2025 Author

Uh oh!

hiyouga Oct 24, 2025 Maintainer

Damon-GSY
Oct 24, 2025

Replies: 1 comment 4 replies

hiyouga
Oct 24, 2025
Maintainer

Damon-GSY Oct 24, 2025
Author

hiyouga Oct 24, 2025
Maintainer

Damon-GSY Oct 24, 2025
Author

hiyouga Oct 24, 2025
Maintainer