Aloha handover by Andrew-Luo1 · Pull Request #29 · google-deepmind/mujoco_playground

Andrew-Luo1 · 2025-01-21T01:56:27Z

Bi-arm handover task. Original reward design by Guy Lever.

2025-01-20.07-29-41.mp4

As seen in the video (50% speed for easier viewing), the shaping rewards consist of 3 terms that create a mostly monotonic "reward potential field" increasing as the robot progresses through the desired motion.

gripper_box drives the left hand to the box.
box_handover rewards the box for getting to a pre-assigned handover point.
handover_target rewards the right hand for getting the box to the target point.

With this formulation alone, the policy takes 30 min to 1 hour to train and gets stuck in local minima for about half the seeds. The difficulty is in the hand-over. Because the rewards plummet when the hands fumble in this process, you get stuck in a minima where both hands clasp onto the box, unwilling to let go. Two tricks to get around this.

First, don't penalize regression during an episode. If $r_{raw}$ is the sum of the above three terms, we use:
$r_{t+1} = \max( r_{raw, t+1} - max_{{\tau\in{0, t}}} r_\tau, 0)$

Second, reset the episode whenever the box is dropped. These tricks drive the robot to get a lot of attempts at the transfer procedure while being unafraid of failure.

On my RTX4090, this is trainining stably across seeds in about 10 min.

…ted stay-in-place

…ewards. Patch franka randomization bug

kevinzakka · 2025-01-21T02:50:47Z

Amazing job @Andrew-Luo1 and excellent PR summary thank you!

PiperOrigin-RevId: 718018875 Change-Id: If3ab6a67b96d435a89b96cae44200b1031dfdfcb

Andrew-Luo1 added 5 commits January 20, 2025 07:40

not stable across seeds

1dfa589

reimplemented rewards to match exact specification

969eed4

works on seed 4 but can fail to learn gripping and maybe too complica…

4faa55c

…ted stay-in-place

handover training fast across seeds

e23e9a7

Aloha Handover completed. Credit to Guy Lever for the three shaping r…

fa4e8f1

…ewards. Patch franka randomization bug

kevinzakka approved these changes Jan 21, 2025

View reviewed changes

copybara-service bot merged commit 54f8081 into google-deepmind:main Jan 21, 2025
5 of 6 checks passed

mohamed-ashrafff pushed a commit to mohamed-ashrafff/mujoco_playground that referenced this pull request Jul 24, 2025

Merge pull request google-deepmind#29 from Andrew-Luo1:aloha_handover

76250c5

PiperOrigin-RevId: 718018875 Change-Id: If3ab6a67b96d435a89b96cae44200b1031dfdfcb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Aloha handover#29

Aloha handover#29
copybara-service[bot] merged 5 commits intogoogle-deepmind:mainfrom
Andrew-Luo1:aloha_handover

Andrew-Luo1 commented Jan 21, 2025

Uh oh!

kevinzakka commented Jan 21, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Andrew-Luo1 commented Jan 21, 2025

Uh oh!

kevinzakka commented Jan 21, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants