-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Add wandb play and resume option #3426
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@@ -38,6 +39,30 @@ def add_rsl_rl_args(parser: argparse.ArgumentParser): | |||
"--log_project_name", type=str, default=None, help="Name of the logging project when using wandb or neptune." | |||
) | |||
|
|||
arg_group.add_argument( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we make a separate group for wandb instead of adding it to rsl-rl group?
# SPDX-License-Identifier: BSD-3-Clause | ||
|
||
|
||
import contextlib |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be moved to isaaclab_rl
directory. Core has nothing to do with logging :)
@@ -196,6 +196,9 @@ class RslRlBaseRunnerCfg: | |||
``{time-stamp}_{run_name}``. | |||
""" | |||
|
|||
run_id: str | None = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm why does this need to be part of rl config? It isn't used by rsl-rl wrapper.
help="Select which wandb checkpoint iteration to load. If not provided, the latest checkpoint will be used.", | ||
) | ||
arg_group.add_argument( | ||
"--wandb_username", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be wandb entity? Or that's the same concept?
Example: | ||
model_path = get_model_checkpoint(run_id="my_run_id", project="my_project", checkpoint=100, wandb_username="my_username") | ||
This will download the model checkpoint from https://wandb.ai/my_username/my_project/runs/my_run_id and save it | ||
to models_tmp/my_project/my_run_id/model_100.pt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would put this in python code-block
|
||
|
||
def get_model_checkpoint( | ||
run_id: str, project="isaaclab", checkpoint: int = -1, wandb_username=None, tmp_folder_dir: str = "models_tmp" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
run_id: str, project="isaaclab", checkpoint: int = -1, wandb_username=None, tmp_folder_dir: str = "models_tmp" | |
run_id: str, project: str = "isaaclab", checkpoint: int = -1, wandb_username=None, tmp_folder_dir: str = "models_tmp" |
models = [] | ||
# List all available files in the run | ||
for file in wdb_run.files(): | ||
if "model" in file.name and file.name.endswith(".pt"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very specific to rsl-rl naming of checkpoints. If plan is to only keep support for rsl-rl, then I suggest moving this to rsl-rl sub-package in isaaclab_rl.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would suggest putting this print after the model is found so you also get to know which model iteration you downloaded.
if wandb_username is None: | ||
wandb_username = os.environ.get("WANDB_USERNAME") | ||
|
||
print("Downloading model from wandb...", f"{wandb_username}/{project}/{run_id}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
print("Downloading model from wandb...", f"{wandb_username}/{project}/{run_id}") | |
print(f"Downloading model from wandb: {wandb_username}/{project}/{run_id}") |
arg_group.add_argument( | ||
"--wandb_username", | ||
type=str, | ||
default=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this make sense?
default=None, | |
default=os.environ["WANDB_USERNAME"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this feature. Added a few comments.
Also is this using the latest wandb version? Or this works with the default wandb version in IsaacLab (which is 0.12). I think the minimum version I needed was 0.19.
This seems a nice feature!! Thanks for the PR, I agree with Mayank: Unless we can make it general to all rl-library, it makes more sense to put in rsl-rl sub package in isaaclab_rl/rsl_rl, or directly pr to rsl_rl if eth folks like it. |
Description
Porting over my old orbit PR for wandb loading and resuming of checkpoints. This allows the user to directly play a specified run from wandb.
Usage:
To e.g. resume (or play) the training from a run located at
https://wandb.ai/<user_name>/<project_name>/runs/<run_id>
This will immediately download the newest checkpoint (or the one with iteration --wandb_checkpoint_iteration XX) and use it for play or resuming.
If you are using an extension template, make sure to carry over the changes to your cli_args.py
Checklist
pre-commit
checks with./isaaclab.sh --format
config/extension.toml
fileCONTRIBUTORS.md
or my name already exists there