Commit ae458ca
Fix CSV logger not capturing learning_rate (#423)
## Summary
Fixes a regression introduced in PR #417 where the `learning_rate`
column in `training_log.csv` was always empty. Also adds model-specific
loss columns to the CSV for better parity with wandb logging.
Fixes #422
## Root Cause
PR #417 made several changes to metrics logging:
1. Removed `LearningRateMonitor` callback (which logged as `lr-Adam`)
2. Added manual learning rate logging as `train/lr`
However, the `CSVLoggerCallback` was only looking for:
- `learning_rate` (direct key - never logged)
- `lr-*` pattern (LearningRateMonitor format - no longer used)
The new `train/lr` key was never checked, resulting in empty
`learning_rate` values.
## Changes
### 1. Fix learning rate lookup (`sleap_nn/training/callbacks.py`)
The CSVLoggerCallback now checks for the learning rate in this order:
1. `learning_rate` (direct key)
2. `train/lr` (current format from lightning modules) ← **NEW**
3. `lr-*` pattern (legacy LearningRateMonitor format)
### 2. Add model-specific CSV columns
(`sleap_nn/training/model_trainer.py`)
Added loss breakdown columns for different model types to match what's
logged to wandb:
| Model Type | New CSV Columns |
|------------|-----------------|
| `bottomup` | `train/confmaps_loss`, `train/paf_loss`,
`val/confmaps_loss`, `val/paf_loss` |
| `multi_class_bottomup` | `train/confmaps_loss`, `train/classmap_loss`,
`train/class_accuracy`, `val/confmaps_loss`, `val/classmap_loss`,
`val/class_accuracy` |
| `multi_class_topdown` | `train/confmaps_loss`,
`train/classvector_loss`, `train/class_accuracy`, `val/confmaps_loss`,
`val/classvector_loss`, `val/class_accuracy` |
### 3. Add test (`tests/training/test_callbacks.py`)
Added `test_on_validation_epoch_end_logs_train_lr_format` to verify the
new `train/lr` key lookup works correctly.
## Example Output
**Before (broken):**
```csv
epoch,train/loss,val/loss,learning_rate,train/time,val/time
0,,0.006371453870087862,,,
1,0.0006624094676226377,0.0002221532049588859,,32.815,6.364
```
**After (fixed):**
```csv
epoch,train/loss,val/loss,learning_rate,train/time,val/time
0,,0.006371453870087862,,,
1,0.0006624094676226377,0.0002221532049588859,0.0001,32.815,6.364
```
## API Changes
### CSV Column Additions
The `training_log.csv` file will now include additional columns
depending on the model type. This is a non-breaking change - existing
code that reads the CSV will continue to work, and the new columns
provide additional information.
**Note:** The CSV column name remains `learning_rate` (not `train/lr`)
for backward compatibility with existing analysis scripts.
## Design Decisions
1. **Backward compatible column name**: We kept `learning_rate` as the
CSV column name rather than changing to `train/lr` to avoid breaking
existing analysis pipelines that expect the old name.
2. **Fallback chain for LR lookup**: The callback checks multiple key
formats in order, maintaining compatibility with:
- Direct `learning_rate` logging (if someone uses it)
- New `train/lr` format (current)
- Legacy `lr-*` format (LearningRateMonitor)
3. **Model-specific columns**: Rather than logging all possible columns
for all models (which would result in many empty columns), we only add
columns relevant to each model type.
## Test Plan
- [x] `pytest
tests/training/test_callbacks.py::TestCSVLoggerCallbackFileOps` - Unit
tests for CSV logger
- [x] `pytest
tests/training/test_model_trainer.py::test_model_trainer_centered_instance`
- Integration test verifying learning_rate is logged correctly
---
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: Claude Opus 4.5 <[email protected]>1 parent df72b51 commit ae458ca
File tree
3 files changed
+71
-2
lines changed- sleap_nn/training
- tests/training
3 files changed
+71
-2
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
85 | 85 | | |
86 | 86 | | |
87 | 87 | | |
88 | | - | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
89 | 92 | | |
90 | 93 | | |
91 | | - | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
92 | 97 | | |
93 | 98 | | |
94 | 99 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
849 | 849 | | |
850 | 850 | | |
851 | 851 | | |
| 852 | + | |
852 | 853 | | |
853 | 854 | | |
854 | 855 | | |
| |||
857 | 858 | | |
858 | 859 | | |
859 | 860 | | |
| 861 | + | |
| 862 | + | |
| 863 | + | |
| 864 | + | |
| 865 | + | |
| 866 | + | |
| 867 | + | |
| 868 | + | |
| 869 | + | |
| 870 | + | |
| 871 | + | |
| 872 | + | |
| 873 | + | |
| 874 | + | |
| 875 | + | |
| 876 | + | |
| 877 | + | |
| 878 | + | |
| 879 | + | |
| 880 | + | |
| 881 | + | |
| 882 | + | |
| 883 | + | |
| 884 | + | |
| 885 | + | |
| 886 | + | |
| 887 | + | |
| 888 | + | |
| 889 | + | |
| 890 | + | |
| 891 | + | |
860 | 892 | | |
861 | 893 | | |
862 | 894 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
622 | 622 | | |
623 | 623 | | |
624 | 624 | | |
| 625 | + | |
| 626 | + | |
| 627 | + | |
| 628 | + | |
| 629 | + | |
| 630 | + | |
| 631 | + | |
| 632 | + | |
| 633 | + | |
| 634 | + | |
| 635 | + | |
| 636 | + | |
| 637 | + | |
| 638 | + | |
| 639 | + | |
| 640 | + | |
| 641 | + | |
| 642 | + | |
| 643 | + | |
| 644 | + | |
| 645 | + | |
| 646 | + | |
| 647 | + | |
| 648 | + | |
| 649 | + | |
| 650 | + | |
| 651 | + | |
| 652 | + | |
| 653 | + | |
| 654 | + | |
| 655 | + | |
| 656 | + | |
625 | 657 | | |
626 | 658 | | |
627 | 659 | | |
| |||
0 commit comments