Update default trainer configuration parameters for improved training stability #374

gitttt-1234 · 2025-11-21T01:27:58Z

Summary

This PR updates the default values for learning rate scheduler and optimizer configurations to improve training performance and stability based on empirical testing and best practices.

Configuration Changes

OptimizerConfig

Learning rate: 1e-3 → 1e-4
- More conservative initial learning rate for better convergence

ReduceLROnPlateauConfig

threshold_mode: "rel" → "abs"
- Absolute threshold mode provides more consistent behavior across different loss scales
threshold: 1e-4 → 1e-6
- Finer-grained sensitivity to loss improvements
patience: 10 → 5
- Faster adaptation to plateaus
factor: 0.1 → 0.5
- More gradual learning rate reduction
cooldown: 0 → 3
- Prevents oscillations after LR reduction
min_lr: 0.0 → 1e-8
- Ensures learning rate doesn't drop to zero

EarlyStoppingConfig

min_delta: 0.0 → 1e-8
- More forgiving threshold for improvement
patience: 1 → 10
- Allows more time for convergence before stopping

LRSchedulerConfig

Now defaults to ReduceLROnPlateauConfig instead of None
- Enables learning rate scheduling by default for better training dynamics

Files Updated

✅ sleap_nn/config/trainer_config.py - Updated defaults and documentation
✅ All sample config files in docs/sample_configs/ (11 files)
✅ All test config files in tests/assets/model_ckpts/ (12 files)
✅ Configuration documentation in docs/config.md
✅ Test assertions in tests/config/test_trainer_config.py

Benefits

🎯 More conservative and stable training behavior
📉 Better handling of loss plateaus with absolute threshold mode
⏱️ Improved early stopping behavior with reasonable patience
🔄 Learning rate scheduling enabled by default

Testing

✅ All tests pass (uv run pytest .)
✅ Linter passes (uv run ruff check sleap_nn/)
✅ Updated test assertions to match new defaults

Backwards Compatibility

These changes update default values only. Users with existing configurations will continue to use their specified values. The new defaults provide better out-of-box performance for new users and projects.

🤖 Generated with Claude Code

This commit updates the default values for learning rate scheduler and optimizer configurations to improve training performance and stability: **Configuration Changes:** - OptimizerConfig: - Learning rate: 1e-3 → 1e-4 - ReduceLROnPlateauConfig: - threshold_mode: "rel" → "abs" - threshold: 1e-4 → 1e-6 - patience: 10 → 5 - factor: 0.1 → 0.5 - cooldown: 0 → 3 - min_lr: 0.0 → 1e-8 - EarlyStoppingConfig: - min_delta: 0.0 → 1e-8 - patience: 1 → 10 - LRSchedulerConfig: - Now defaults to ReduceLROnPlateauConfig instead of None **Files Updated:** - Updated sleap_nn/config/trainer_config.py with new defaults and documentation - Updated all sample config files in docs/sample_configs/ - Updated test config files in tests/assets/model_ckpts/ - Updated configuration documentation in docs/config.md - Updated test assertions in tests/config/test_trainer_config.py The new defaults provide: - More conservative learning rate scheduling - Better threshold sensitivity with absolute mode - Improved early stopping behavior - More stable training convergence 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

codecov · 2025-11-21T01:34:55Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.83%. Comparing base (ff91433) to head (d62cd78).
⚠️ Report is 64 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #374      +/-   ##
==========================================
- Coverage   95.28%   93.83%   -1.46%     
==========================================
  Files          49       49              
  Lines        6765     7181     +416     
==========================================
+ Hits         6446     6738     +292     
- Misses        319      443     +124

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Changed the default value of stop_training_on_plateau from False to True to enable early stopping by default for better training behavior. Updates: - sleap_nn/config/trainer_config.py: Updated default to True and fixed documentation - docs/config.md: Updated documentation to reflect new default - tests/config/test_trainer_config.py: Updated test assertions to expect True This ensures early stopping is enabled by default with the improved patience and min_delta values from the previous commit. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

gitttt-1234 and others added 2 commits November 20, 2025 17:27

Format files

ea2ec8e

gitttt-1234 merged commit b3432ef into main Nov 21, 2025
8 checks passed

gitttt-1234 deleted the update-threshold-mode-defaults branch November 21, 2025 01:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update default trainer configuration parameters for improved training stability #374

Update default trainer configuration parameters for improved training stability #374

Uh oh!

gitttt-1234 commented Nov 21, 2025

Uh oh!

codecov bot commented Nov 21, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Update default trainer configuration parameters for improved training stability #374

Update default trainer configuration parameters for improved training stability #374

Uh oh!

Conversation

gitttt-1234 commented Nov 21, 2025

Summary

Configuration Changes

OptimizerConfig

ReduceLROnPlateauConfig

EarlyStoppingConfig

LRSchedulerConfig

Files Updated

Benefits

Testing

Backwards Compatibility

Uh oh!

codecov bot commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Nov 21, 2025 •

edited

Loading