Skip to content

Conversation

@RobertKirk
Copy link

Previously, weighting of confidence loss would jump from step_frac just before step_frac > warmup_frac to 1.0 when step_frac > warmup_frac.

For example, if warmup_frac = 0.1, then when step_frac = 0.1, coef = step_frac = 0.1, and then at the next step step_frac > 0.1 = warmup_frac, so coef = 1.0.

The paper describes this weighting as increasing smoothly. The change makes it increase linearly from 0 to 1 during the first warmup_frac steps.

We change > to >= to allow warmup_frac = 0.0, then on the first step step_frac = 0.0 and coef = 1.0, otherwise we'd get a divide by 0 error on the first step.

Previously, weighting of confidence loss would jump from `step_frac` just before `step_frac > warmup_frac` to `1.0` when `step_frac > warmup_frac`.

For example, if `warmup_frac = 0.1`, then when `step_frac = 0.1`, `coef = step_frac = 0.1`, and then at the next step `step_frac > 0.1 = warmup_frac`, so `coef = 1.0`.

The paper describes this weighting as increasing smoothly. The change makes it increase linearly from 0 to 1 during the first `warmup_frac` steps.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant