Skip to content

gradient clipping for numerical stability #98

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 28, 2025
Merged

gradient clipping for numerical stability #98

merged 1 commit into from
Jul 28, 2025

Conversation

topepo
Copy link
Member

@topepo topepo commented Jul 27, 2025

Closes #73

There are occasions where the model parameters lead to overflow when computing the loss function:

library(brulee)
library(modeldata)
#> 
#> Attaching package: 'modeldata'
#> The following object is masked from 'package:datasets':
#> 
#>     penguins

i <- 25691
set.seed(i)
data_tr <- sim_logistic(200, ~ .1 + 2 * A - 3 * B + 1 * A *B, corr = .7)

set.seed(i+1)
mlp_fit <- brulee_mlp(class ~ ., data = data_tr, hidden_units = 10, epochs = 35,
                      stop_iter = Inf, verbose = TRUE)
#> epoch: 1 learn rate 0.01 Loss: 0.581
#> epoch: 2 learn rate 0.01 Loss: 0.492
#> epoch: 3 learn rate 0.01 Loss: 0.482
#> epoch: 4 learn rate 0.01 Loss: 0.474
#> epoch: 5 learn rate 0.01 Loss: 0.47
#> epoch: 6 learn rate 0.01 Loss: 0.465
#> epoch: 7 learn rate 0.01 Loss: 0.466 ✖
#> epoch: 8 learn rate 0.01 Loss: 0.461
#> epoch: 9 learn rate 0.01 Loss: 0.45
#> epoch: 10 learn rate 0.01 Loss: 0.45
#> epoch: 11 learn rate 0.01 Loss: 0.45 ✖
#> epoch: 12 learn rate 0.01 Loss: 0.45 ✖
#> epoch: 13 learn rate 0.01 Loss: 0.45 ✖
#> epoch: 14 learn rate 0.01 Loss: 0.45 ✖
#> epoch: 15 learn rate 0.01 Loss: 0.45 ✖
#> epoch: 16 learn rate 0.01 Loss: 0.45 ✖
#> epoch: 17 learn rate 0.01 Loss: 0.45 ✖
#> epoch: 18 learn rate 0.01 Loss: 0.45 ✖
#> epoch: 19 learn rate 0.01 Loss: 0.45 ✖
#> epoch: 20 learn rate 0.01 Loss: 0.45 ✖
#> epoch: 21 learn rate 0.01 Loss: 0.45 ✖
#> epoch: 22 learn rate 0.01 Loss: 0.45 ✖
#> epoch: 23 learn rate 0.01 Loss: 0.449
#> epoch: 24 learn rate 0.01 Loss: 0.45 ✖
#> epoch: 25 learn rate 0.01 Loss: 0.448
#> epoch: 26 learn rate 0.01 Loss: 0.448 ✖
#> epoch: 27 learn rate 0.01 Loss: 0.449 ✖
#> epoch: 28 learn rate 0.01 Loss: 0.448
#> epoch: 29 learn rate 0.01 Loss: 0.449 ✖
#> epoch: 30 learn rate 0.01 Loss: 0.448 ✖
#> epoch: 31 learn rate 0.01 Loss: 0.448
#> Warning: Loss is NaN at epoch 32. Training is stopped.

Created on 2025-07-27 with reprex v2.1.1

This PR adds two new parameters that can prevent the gradient values or their norms from becoming too large:

library(brulee)
library(modeldata)
#> 
#> Attaching package: 'modeldata'
#> The following object is masked from 'package:datasets':
#> 
#>     penguins

i <- 25691
set.seed(i)
data_tr <- sim_logistic(200, ~ .1 + 2 * A - 3 * B + 1 * A *B, corr = .7)

set.seed(i+1)
mlp_fit <- brulee_mlp(class ~ ., data = data_tr, hidden_units = 10, epochs = 35, 
                      grad_norm_clip = 1 / 50, stop_iter = Inf, verbose = TRUE)
#> epoch: 1 learn rate 0.01 Loss: 0.783
#> epoch: 2 learn rate 0.01 Loss: 0.78
#> epoch: 3 learn rate 0.01 Loss: 0.777
#> epoch: 4 learn rate 0.01 Loss: 0.775
#> epoch: 5 learn rate 0.01 Loss: 0.772
#> epoch: 6 learn rate 0.01 Loss: 0.769
#> epoch: 7 learn rate 0.01 Loss: 0.767
#> epoch: 8 learn rate 0.01 Loss: 0.764
#> epoch: 9 learn rate 0.01 Loss: 0.761
#> epoch: 10 learn rate 0.01 Loss: 0.759
#> epoch: 11 learn rate 0.01 Loss: 0.756
#> epoch: 12 learn rate 0.01 Loss: 0.753
#> epoch: 13 learn rate 0.01 Loss: 0.751
#> epoch: 14 learn rate 0.01 Loss: 0.748
#> epoch: 15 learn rate 0.01 Loss: 0.746
#> epoch: 16 learn rate 0.01 Loss: 0.743
#> epoch: 17 learn rate 0.01 Loss: 0.74
#> epoch: 18 learn rate 0.01 Loss: 0.738
#> epoch: 19 learn rate 0.01 Loss: 0.735
#> epoch: 20 learn rate 0.01 Loss: 0.733
#> epoch: 21 learn rate 0.01 Loss: 0.73
#> epoch: 22 learn rate 0.01 Loss: 0.728
#> epoch: 23 learn rate 0.01 Loss: 0.725
#> epoch: 24 learn rate 0.01 Loss: 0.723
#> epoch: 25 learn rate 0.01 Loss: 0.721
#> epoch: 26 learn rate 0.01 Loss: 0.718
#> epoch: 27 learn rate 0.01 Loss: 0.716
#> epoch: 28 learn rate 0.01 Loss: 0.713
#> epoch: 29 learn rate 0.01 Loss: 0.711
#> epoch: 30 learn rate 0.01 Loss: 0.709
#> epoch: 31 learn rate 0.01 Loss: 0.706
#> epoch: 32 learn rate 0.01 Loss: 0.704
#> epoch: 33 learn rate 0.01 Loss: 0.702
#> epoch: 34 learn rate 0.01 Loss: 0.7
#> epoch: 35 learn rate 0.01 Loss: 0.697

Created on 2025-07-27 with reprex v2.1.1

@topepo topepo marked this pull request as ready for review July 28, 2025 10:18
@topepo topepo requested a review from dfalbel July 28, 2025 10:18
Copy link
Collaborator

@dfalbel dfalbel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@topepo topepo merged commit 450f4a2 into main Jul 28, 2025
9 checks passed
@topepo topepo deleted the clipping branch July 28, 2025 14:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Loss computations fail with SGD
2 participants