[Question] Sim2Real -- The joint position (qpos) in the simulation does not align with the applied actions #1429
Replies: 3 comments
-
Hello @WangYCheng23 ! This is an EXCELLENT question! Consequently, I have some answers and more questions :D Closing the sim to real gap can be approached from two directions. One is to make the simulation and environment more realistic, while the other is to make the policy more robust. Isaac Lab and Isaac Sim both provide tools for achieving these goals, and depending on your personal goals for this project, you may need one or both. In the former case, where we want to make the simulation more robust, the first step is tuning. I don't know what robot you are using, but from your plots it appears to be at least partially tuned. Is this one of ours? A demo environment or something? If not, what is the robot? what are you trying to do with it? How have you tuned it? etc... also, what are the vertical units? degrees or radians? (I hope degrees) If it IS already tuned, then the next step would be to examine various actuator models. By default we supply an "Ideal" PD actuator model to control the joints of an articulation (check here for details) but this can be further modified to include noise, though you would need to define a custom actuator in python. Noise can be added to many parts of the simulation in this fashion, and this kind of modification is going to be necessary if absolute precision is the ultimate goal. This takes us to the latter case, because if you introduce noise to your simulation, then you need to retrain your policy. However, in doing so, you will find the policy becomes slightly more robust to inaccuracies in joint position. By adding noise to the joints, we have effectively increased the domain in which the policy must function, and thus "teach it" how to handle noise at deployment. This process is called Domain Randomization and is one of the go-to tools for making transferable policies. Adding "noise" doesn't need to be literal at all levels. All that is required is an expansion of the domain of the training environment. If you are using cameras to identify the pose and location of objects, this might include randomizing objects and lighting. If you are picking and placing things with a robot arm, this might involve randomizing weights and the types of objects being picked, etc... Isaac Sim provides a suite of tools in the form of Replicator that provides things like omnigraph nodes for randomizing the stage on a specific event. It may even be possible to trigger these events off of the training results, effectively making an automated DR curriculum. I can go on, but I need to know more about what you are trying to do :3 I hope this helps! |
Beta Was this translation helpful? Give feedback.
-
HI @mpgussert , Thank you so much for your reply! I apologize for my late response; the holiday kept me occupied. First, I’m using the ALOHA robot arm, for which I created a URDF and converted it into a USD file to develop my own robot asset. Secondly, I have designed my own environment to perform open and close tasks with various pieces of furniture. Additionally, the vertical units are in radians( sadly :( ). One of my rewards includes a penalty for the robot being "alive," which encourages it to open or close as quickly as possible. However, I would like the rate of change in radians to be slower, which contradicts the task requirements. This makes tuning quite challenging. Do you have any advice for managing this type of task? I haven’t explored different actuators extensively. I’m currently using implicit actuators. Also, I don’t have the correct stiffness and damping values for the ALOHA robot, as I haven’t been able to find them online (anyone of you guys know this, plz email to me, thanks!!). I read your suggestion about adding noise to the actuator, which is definitely necessary. However, in RSL-RL, it seems they already use "mean" and "std" to add noise to actions by building an action distribution and sampling from it. Besides, I am currently achieving excellent performance in the simulation while I running play_xx.py. However, it appears that the actions are changing too drastically. As shown in the figure, some actions exceed 1 rad in a single step, despite my efforts to impose penalties for "action rate" and "joint velocity." Even though the actions change rapidly, we can implement interpolation for the actual robot arm. The major issue I'm facing is that when I set an action (absolute joint angles), the robot arm's position (qpos) does not align with those angles! This has been quite frustrating for me—haha! |
Beta Was this translation helpful? Give feedback.
-
This is a great topic. Thank you for posting these questions and comments. I will move this issue into Discussions for the team to follow up. |
Beta Was this translation helpful? Give feedback.
-
Question
After successfully training a policy network, I applied it to a real robot. However, in Isaac Lab, the joint positions (qpos) do not fully align with the actions I provide, as shown in Figure 1. In contrast, the real robot accurately follows my commands. This discrepancy creates a sim-to-real gap.
Could someone explain how actions influence qpos in Isaac Lab? Additionally, could the Isaac Lab team provide documentation on best practices for ensuring that a trained policy network performs effectively on a real robot? It would be helpful to include key considerations, such as the settings for 'decimation' and 'dt'.
Thank you!
Beta Was this translation helpful? Give feedback.
All reactions