Skip to content

Integrate wandb logging & sweep for ACTPolicy #28

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

nqtabokado
Copy link

@nqtabokado nqtabokado commented Jun 20, 2025

Summary

This PR integrates Weights & Biases (wandb) into the ACTPolicy training pipeline of the RoboManipBaselines project, adding:

  • Training metrics logging
  • Hyperparameter sweep (tuning) support
  • Model saving with best checkpoint tracking

This is an initial integration targeting TrainAct.py and CLI entrypoint (Train.py) for ACTPolicy.

Changes

  • Added wandb logging into:
    • Training loop (TrainAct.py)
    • Per-step and per-epoch metrics
  • Implemented support for wandb sweeps (via CLI args in Train.py)
  • Save best checkpoint and last checkpoint during training
  • Minor code formatting and cleanup (via pre-commit hooks)

Notes

  • ⚠️ Users must log in to a wandb account (wandb login) before training, even if not using --sweep
    • This is required because all logs are pushed to the online wandb dashboard
    • 📌 This integration uses the cloud (wandb.ai), not local-only logging
  • Users should install wandb (pip install wandb) if not already present
  • Sweep config and CLI example are provided in Train.py
  • Currently integrated for ACTPolicy — other policies can be added in the future
  • 🐳 I attempted to use the wandb/local Docker solution for offline/local logging, but encountered unresolved errors — currently defaulting to online (cloud-based) usage.
  • ✅ No breaking changes — training still works as expected if wandb is installed and logged in

How to test

  1. Run normal training for ACTPolicy:

    python ./bin/Train.py Act --dataset_dir ./dataset/MujocoUR5eCable_20250609/
  2. Run sweep training for ACTPolicy:

    python ./bin/Train.py Act --sweep --sweep_count 10 --dataset_dir ./dataset/MujocoUR5eCable_20250609/
  3. Verify:

    • wandb dashboard shows metrics
    • checkpoints saved under output/ckpt/
  4. Results:
    image
    image
    image
    image

Checklist

  • wandb logging integrated for ACTPolicy
  • sweep config working
  • save best checkpoint working
  • code formatting passes pre-commit
  • verified no breaking changes to training

Future work

  • Extend wandb integration to other policies (Mlp, Sarnn, MtAct, DiffusionPolicy)
  • Add config option to enable/disable wandb from CLI
  • Add support for logging additional metrics (e.g., success rate, reward)
  • Document example sweep config files (yaml/json)

Known limitations

  • Currently only integrated for ACTPolicy
  • Sweep uses default config in code — no external sweep yaml yet
  • No automatic resume — wandb run starts fresh each time

@nqtabokado nqtabokado changed the title Integrate wandb Integrate wandb logging & sweep for ACTPolicy Jun 20, 2025

# Sweep entrypoint
@classmethod
def sweep_entrypoint(cls):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to define this method in a per-policy class?
If it is sufficient to define it in the if args.sweep: block of Train.py, that would be simpler.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! You're right — I’ve moved the sweep logic to Train.py's if args.sweep: block as suggested.

@mmurooka
Copy link
Member

Thanks for the contribution! We're not currently using WandB, but if it becomes truly necessary, we'd be happy to merge it. (Before merging, we'd like to clean it up—for example, by moving as much of the code added to TrainAct that is common across all policies into TrainBase.)

I'm curious whether sweeps are practically useful. Did using sweeps help you find better hyperparameters?

@nqtabokado
Copy link
Author

Thanks for the contribution! We're not currently using WandB, but if it becomes truly necessary, we'd be happy to merge it. (Before merging, we'd like to clean it up—for example, by moving as much of the code added to TrainAct that is common across all policies into TrainBase.)

I'm curious whether sweeps are practically useful. Did using sweeps help you find better hyperparameters?
Thanks again for the feedback!

I ran a sweep with 10 trials and observed that certain hyperparameter combinations (especially chunk_size, kl_weight, and hidden_dim) made a noticeable difference in validation loss.
image
image
You can see that trial eager-sweep-1 yielded the best result — achieving the lowest validation loss among all runs.

@mmurooka
Copy link
Member

Thank you for the explanation. However, the difficulty in imitation learning is that even if the validation loss is small, it does not necessarily mean that the task success rate is high at rollout of the policy. This is mentioned in Appendix G of https://arxiv.org/abs/2108.03298

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants