Skip to content

No clip to loss within Qwen2VLGRPOTrainer #21

@Vincex0

Description

@Vincex0

Hey in the original paper there is a clip term that I don't see here, during training the grad norm can sometimes be pretty high, leading to instability

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions