Aloha handover#29
Merged
copybara-service[bot] merged 5 commits intogoogle-deepmind:mainfrom Jan 21, 2025
Merged
Conversation
…ted stay-in-place
…ewards. Patch franka randomization bug
Collaborator
|
Amazing job @Andrew-Luo1 and excellent PR summary thank you! |
kevinzakka
approved these changes
Jan 21, 2025
54f8081
into
google-deepmind:main
5 of 6 checks passed
mohamed-ashrafff
pushed a commit
to mohamed-ashrafff/mujoco_playground
that referenced
this pull request
Jul 24, 2025
PiperOrigin-RevId: 718018875 Change-Id: If3ab6a67b96d435a89b96cae44200b1031dfdfcb
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Bi-arm handover task. Original reward design by Guy Lever.
2025-01-20.07-29-41.mp4
As seen in the video (50% speed for easier viewing), the shaping rewards consist of 3 terms that create a mostly monotonic "reward potential field" increasing as the robot progresses through the desired motion.
gripper_boxdrives the left hand to the box.box_handoverrewards the box for getting to a pre-assigned handover point.handover_targetrewards the right hand for getting the box to the target point.With this formulation alone, the policy takes 30 min to 1 hour to train and gets stuck in local minima for about half the seeds. The difficulty is in the hand-over. Because the rewards plummet when the hands fumble in this process, you get stuck in a minima where both hands clasp onto the box, unwilling to let go. Two tricks to get around this.
First, don't penalize regression during an episode. If$r_{raw}$ is the sum of the above three terms, we use:
$r_{t+1} = \max( r_{raw, t+1} - max_{{\tau\in{0, t}}} r_\tau, 0)$
Second, reset the episode whenever the box is dropped. These tricks drive the robot to get a lot of attempts at the transfer procedure while being unafraid of failure.
On my RTX4090, this is trainining stably across seeds in about 10 min.
