Integrate wandb logging & sweep for ACTPolicy #28

nqtabokado · 2025-06-20T07:26:30Z

Summary

This PR integrates Weights & Biases (wandb) into the ACTPolicy training pipeline of the RoboManipBaselines project, adding:

Training metrics logging
Hyperparameter sweep (tuning) support
Model saving with best checkpoint tracking

This is an initial integration targeting TrainAct.py and CLI entrypoint (Train.py) for ACTPolicy.

Changes

Added wandb logging into:
- Training loop (TrainAct.py)
- Per-step and per-epoch metrics
Implemented support for wandb sweeps (via CLI args in Train.py)
Save best checkpoint and last checkpoint during training
Minor code formatting and cleanup (via pre-commit hooks)

Notes

⚠️ Users must log in to a wandb account (wandb login) before training, even if not using --sweep
- This is required because all logs are pushed to the online wandb dashboard
- 📌 This integration uses the cloud (wandb.ai), not local-only logging
Users should install wandb (pip install wandb) if not already present
Sweep config and CLI example are provided in Train.py
Currently integrated for ACTPolicy — other policies can be added in the future
🐳 I attempted to use the wandb/local Docker solution for offline/local logging, but encountered unresolved errors — currently defaulting to online (cloud-based) usage.
✅ No breaking changes — training still works as expected if wandb is installed and logged in

How to test

Run normal training for ACTPolicy:

python ./bin/Train.py Act --dataset_dir ./dataset/MujocoUR5eCable_20250609/

Run sweep training for ACTPolicy:

python ./bin/Train.py Act --sweep --sweep_count 10 --dataset_dir ./dataset/MujocoUR5eCable_20250609/

Verify:
- wandb dashboard shows metrics
- checkpoints saved under output/ckpt/
Results:

Checklist

wandb logging integrated for ACTPolicy
sweep config working
save best checkpoint working
code formatting passes pre-commit
verified no breaking changes to training

Future work

Extend wandb integration to other policies (Mlp, Sarnn, MtAct, DiffusionPolicy)
Add config option to enable/disable wandb from CLI
Add support for logging additional metrics (e.g., success rate, reward)
Document example sweep config files (yaml/json)

Known limitations

Currently only integrated for ACTPolicy
Sweep uses default config in code — no external sweep yaml yet
No automatic resume — wandb run starts fresh each time

This reverts commit 5887532.

mmurooka · 2025-06-24T03:30:59Z

robo_manip_baselines/policy/act/TrainAct.py

+
+    # Sweep entrypoint
+    @classmethod
+    def sweep_entrypoint(cls):


Is it necessary to define this method in a per-policy class?
If it is sufficient to define it in the if args.sweep: block of Train.py, that would be simpler.

Thanks! You're right — I’ve moved the sweep logic to Train.py's if args.sweep: block as suggested.

mmurooka · 2025-06-24T03:41:21Z

Thanks for the contribution! We're not currently using WandB, but if it becomes truly necessary, we'd be happy to merge it. (Before merging, we'd like to clean it up—for example, by moving as much of the code added to TrainAct that is common across all policies into TrainBase.)

I'm curious whether sweeps are practically useful. Did using sweeps help you find better hyperparameters?

nqtabokado · 2025-06-25T14:50:53Z

Thanks for the contribution! We're not currently using WandB, but if it becomes truly necessary, we'd be happy to merge it. (Before merging, we'd like to clean it up—for example, by moving as much of the code added to TrainAct that is common across all policies into TrainBase.)

I'm curious whether sweeps are practically useful. Did using sweeps help you find better hyperparameters?
Thanks again for the feedback!

I ran a sweep with 10 trials and observed that certain hyperparameter combinations (especially chunk_size, kl_weight, and hidden_dim) made a noticeable difference in validation loss.

You can see that trial eager-sweep-1 yielded the best result — achieving the lowest validation loss among all runs.

mmurooka · 2025-06-28T13:58:17Z

Thank you for the explanation. However, the difficulty in imitation learning is that even if the validation loss is small, it does not necessarily mean that the task success rate is high at rollout of the policy. This is mentioned in Appendix G of https://arxiv.org/abs/2108.03298

nqtabokado added 4 commits June 20, 2025 16:18

integrate wandb

d310d2c

fix act import library

5887532

Revert "fix act import library"

c998232

This reverts commit 5887532.

fix format by pre-commit

8dea819

nqtabokado changed the title ~~Integrate wandb~~ Integrate wandb logging & sweep for ACTPolicy Jun 20, 2025

fix comment

92ebaf3

mmurooka reviewed Jun 24, 2025

View reviewed changes

nqtabokado added 2 commits June 25, 2025 23:29

move sweep to Train.py

2505e3a

fix pre-commit

c89d923

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Integrate wandb logging & sweep for ACTPolicy #28

Integrate wandb logging & sweep for ACTPolicy #28

Uh oh!

nqtabokado commented Jun 20, 2025 •

edited

Loading

Uh oh!

mmurooka Jun 24, 2025

Uh oh!

nqtabokado Jun 25, 2025

Uh oh!

mmurooka commented Jun 24, 2025

Uh oh!

nqtabokado commented Jun 25, 2025

Uh oh!

mmurooka commented Jun 28, 2025

Uh oh!

Uh oh!

Integrate wandb logging & sweep for ACTPolicy #28

Are you sure you want to change the base?

Integrate wandb logging & sweep for ACTPolicy #28

Uh oh!

Conversation

nqtabokado commented Jun 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Notes

How to test

Checklist

Future work

Known limitations

Uh oh!

mmurooka Jun 24, 2025

Choose a reason for hiding this comment

Uh oh!

nqtabokado Jun 25, 2025

Choose a reason for hiding this comment

Uh oh!

mmurooka commented Jun 24, 2025

Uh oh!

nqtabokado commented Jun 25, 2025

Uh oh!

mmurooka commented Jun 28, 2025

Uh oh!

Uh oh!

nqtabokado commented Jun 20, 2025 •

edited

Loading