This is the code for deploy Fusion-Perception-to-Action Transformer (FP2AT), PerAct, Act3D, RVT2, RVT on ROS machine. Including data collection, and robot experiments for UR robots.
Author: Marco Yangjun Liu,
Affiliation: Centre for Artificial Intelligence and Robotics, University of Macau; Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences
If you are using this code, please add the following citation to your publication:
@ARTICLE{10874177,
author={Liu, Yangjun and Liu, Sheng and Chen, Binghan and Yang, Zhi-Xin and Xu, Sheng},
journal={IEEE Transactions on Robotics},
title={Fusion-Perception-to-Action Transformer: Enhancing Robotic Manipulation With 3-D Visual Fusion Attention and Proprioception},
year={2025},
volume={41},
pages={1553-1567},
Hardware: UR5, realsense camera, robotiq gripper (or RobustMotion) Software: RLBench
- franka_htc_teleop.zip
- move_group_python_interface_tutorial
- [MoveItjade API](http://docs.ros.org/en/jade/api/moveit_commander/html/- classmoveit__commander_1_1move__group_1_1MoveGroupCommander.html)
- MoveIt jade source code
- MoveIt Noetic Source Code: /opt/ros/noetic/lib/python3/dist-packages/moveit_commander/move_group.py
peract_demo_interface.pyis for collecting dataperact_agent_interface.pyis for executing trained models. See issue18 for more details on the setup, and issue2 for real-world data.
- may change arm.franka_c2farm_perceiver_lang_bc to agents.peract_bc
- ur_calibrate at first
roslaunch ur_calibration calibration_correction.launch robot_ip:=172.20.172.102 target_filename:="/home/marco/robotic_sorting/src/Universal_Robots_ROS_Driver/ur_robot_driver/config/my_robot_calibration.yaml"
- check the IP of 3090, robot1 and robot2
- ROS communication between two computers
In the first computer with force sensor and gripper
$ roscore
$ export ROS_IP=172.20.172.103 # inet IP of the first computer
# Connect to force sensor
cd ~/ur5_ws
source ./devel/setup.bash
sudo chmod +777 /dev/ttyUSB1
rosrun force_sensor get_sensor_data.py
# Connect to the gripper
cd ~/ur5_ws
source ./devel/setup.bash
rosrun robotiq_2f_gripper_control Robotiq2FGripperRtuNode.py /dev/ttyUSB0
In the second computer with vision
$ export ROS_MASTER_URI=http://172.20.172.103:11311
$ export ROS_IP=172.20.172.101 # inet IP of the second computer
$ rostopic list # verify
- Modify and roslaunch the eye_to_hand_calibration_official_driver.launch
$ roslaunch easy_handeye fp2a_front_eye_to_hand_calibration.launch
$ roslaunch easy_handeye fp2a_overhead_eye_to_hand_calibration.launch
- set the correct ip and port for external control as 172.20.172.101:50002 (PC_IP, 50002 as default), load a program on PolyScope which contains the external_control (e.g., in E204 lab: 为机器人编程 -> 加载 ros_control (-> 安装设置 -> 设置external control))
- calibrate the hand and eye following the workflow: Check starting pose -> (next pose -> plan -> execute -> taske sample) x17 -> compute -> save (aumatically save into ~/.ros/easy_handeye/xxx.yaml)
- Modify the publish.launch accordingly.Then Check calibration results
$ roslaunch easy_handeye fp2a_two_cam_publish.launch
$ rosrun tf tf_echo /base /camera_color_optical_frame
$ rosrun tf tf_echo /base /camera_overhead_color_optical_frame
-
results for the front camera:
-
Tips for accuracy
- Maximize rotation between poses.
- Minimize the distance from the target to the camera of the tracking system.
- Minimize the translation between poses.
- Use redundant poses.
- Calibrate the camera intrinsics if necessary / applicable.
- Calibrate the robot if necessary / applicable.
- Start the UR driver, and check the node for publishing joint states
roslaunch ur_robot_driver ur5_bringup.launch robot_ip:=172.20.172.102 kinematics_config:=/home/marco/robotic_sorting/src/Universal_Robots_ROS_Driver/ur_robot_driver/config/my_robot_calibration.yaml
rostopic list
rostopic echo /joint_states
- check the names of robot links
rosrun tf tf_echo /base /tool0 # to check the tf tree and frames
# check if the x,y,z of tf are the same as socket communication
rostopic echo tcp_pose # published by learning joy (ur_joy_controlnew.py), in translation + 3D rot vec format (the data acquired by socket)
rosrun two_ur first_arm.py
# the tf results are the same as: $ rosrun ur_robot_driver test_get_tf.py
$ rosrun rqt_tf_tree rqt_tf_tree # more intuitively to observe the tf tree
- rqt, rqt_graph, catkin init, catkin build, catkin build pkg, catkin clean my_package
- Start the camera
$ roslaunch mecheye_ros_interface start_camera.launch
$ roslaunch realsense2_camera rs_camera.launch align_depth:=true camera:=camera serial_no:=135222250811 enable_infra:=true enable_infra1:=true enable_infra2:=true initial_reset:=true filters:=colorizer
$ roslaunch realsense2_camera rs_camera.launch align_depth:=true camera:=camera_overhead serial_no:=135222251592 filters:=pointcloud,spatial,temporal,hole_filling
# Other arguments
- filters:=pointcloud,spatial,temporal,hole_filling #colorizer, ; according to https://github.com/IntelRealSense/realsense-ros/tree/ros1-legacy#launch-parameters
- initial_reset:=true
# config and start multiple cameras according to https://github.com/IntelRealSense/realsense-ros/tree/ros1-legacy#work-with-multiple-cameras
$ rs-enumerate-devices | grep Serial # # find the serial numbers of cameras
# launch multiple cameras
$ roslaunch realsense2_camera rs_two_devices.launch camera1:=camera serial_no_camera1:=135222250811 camera2:=camera_overhead serial_no_camera2:=135222251592 filters:=spatial,temporal
pointcloud,hole_filling
- $ roslaunch easy_handeye publish.launch
- Start the UR driver, Camera and transformation publisher together
$ roslaunch ur_robot_driver ur5_fp2a_demo.launch
- publish the extrinsics and cameras
roslaunch easy_handeye fp2a_two_cam_publish.launch
- set the correct ip and port for external control as 172.20.172.101:50002 (PC_IP, 50002 as default), load a program on PolyScope which contains the external_control (e.g., in E204 lab: 为机器人编程 -> 加载 ros_control (-> 安装设置 -> 设置external control))
- close the ur_joy and peract_demo_interface.py, then goto initial pose
rosrun fp2a_ros goto_initial_pose.py
4. build learning_joy and Start the joy driver
if any error of lacking of Ur_move.h happens when catkin_make or catkin build, comment away the "add_executable" in CMaKeList.txt to catkin build the msg at first, then catkin build the entire package
$ roslaunch learning_joy ur_joy.launch
watch joy topics: $ rostopic echo joy
5.
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$COPPELIASIM_ROOT
export QT_QPA_PLATFORM_PLUGIN_PATH=$COPPELIASIM_ROOT
# check the port of RM gripper (e.g., '/dev/ttyUSB1'), and modify the corresponding parts in peract_demo_interface.py
rosrun fp2a_ros peract_demo_interface.py
# or for robotiq: peract_demo_interface_robotiq.py
( roslaunch fp2a_ros fp2a_demo.launch )
- Start the UR driver and MoveIt,
$ roslaunch ur_robot_driver ur5_fp2a_agent.launch # including ur5_bringup.launch
- Start the Cameras
- connect two ROS machine
- set the correct ip and port for external control as 172.20.172.101:50002 (PC_IP, 50002 as default), load a program on PolyScope which contains the external_control (e.g., in E204 lab: 为机器人编程 -> 加载 ros_control (-> 安装设置 -> 设置external control))
- set ROS_MASTER_URI and ROS_IP in the computer with vision, set ROS_IP in the computer with force sensor and gripper (5. roslaunch easy_handeye publish.launch)
- rosrun fp2a_ros peract_agent_interface.py
- use joy stick to control the workflow including execution, visualization and setting new language
- goto initial pose after each task
roslaunch ur_robot_driver ur5_fp2a_agent.launch
roslaunch easy_handeye fp2a_two_cam_publish.launch
rosrun fp2a_ros goto_initial_pose.py
roslaunch learning_joy agent_joy.launch # for testing
roslaunch learning_joy ur_joy.launch # for remote joystick control
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$COPPELIASIM_ROOT
export QT_QPA_PLATFORM_PLUGIN_PATH=$COPPELIASIM_ROOT
(
export PATH=/usr/local/cuda-11.8/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-11.8/lib64:$LD_LIBRARY_PATH
)
# check the port of RM gripper (e.g., '/dev/ttyUSB1'), and modify the corresponding parts in peract_agent_interface.py
rosrun fp2a_ros peract_agent_interface.py
# or for robotiq: peract_agent_interface_robotiq.py
rosrun robotiq_2f_gripper_control Robotiq2FGripperSimpleController.py
roslaunch learning_joy agent_joy.launch
rosrun fp2a_ros peract_replay_interface.py
Option 1: (recommended)
- modify the urdf, (especially end-effector length, gripper length, 0.214m for robotiq, 0.260m for RM Gripper)
- set the offset for socket communication, especially for the movement when collecting data
- directly get the gripper_pose via get_tf
- directly go to gripper_pose via MoveIt when testing without extra apply_gripper_offset
Option 2:
- keep the original urdf without force sensor and gripper
- use the original pose for socket communication without force sensor and gripper
- add offset when collecting data, considering the consistency between voxelgrid and 3D space
- when testing: add extra apply_gripper_offset for inputting and remove offset for execution, since the flange end are controlled
Option 3: Doesn't work due to limited planning ability of urscript
- set the offset for socket communication, use urscript to move the robot when collecting and executing
- the rotation acquired by $ rosrun tf tf_echo /base /tool0, get_pose() via socket and the PolyScope (teach pad) are different from each other
- To use ROS tf to get the pose of the TCP (tool0), we simply modified the xyz value (x=0.214) of tool0 in /opt/ros/noetic/share/ur_description/urdf/inc/ur_macro.xacro. The original ur_macro.xacro was backed up as ur_macro.xacro.backup. Pull it back when u wanna use a new setting of UR robots (end-effector length, gripper length, 0.214m for robotiq, 0.260m for RM Gripper)
- The MoveIt itself can achieve collision avoidance via the default motion planner OMPL (Open Motion Planning Library), avoiding both collision with the robot arm itself (see ur_description/config/ur5/physical_parameters.yaml) and the environment [move_group.compute_cartesian_path(self, waypoints, eef_step, jump_threshold, avoid_collisions = True, path_constraint=None,)]. Check details in https://ros-planning.github.io/moveit_tutorials/doc/move_group_python_interface/move_group_python_interface_tutorial.html#attaching-objects-to-the-robot , https://ros-planning.github.io/moveit_tutorials/doc/bullet_collision_checker/bullet_collision_checker.html , https://ros-planning.github.io/moveit_tutorials/doc/visualizing_collisions/visualizing_collisions_tutorial.html , https://ros-planning.github.io/moveit_tutorials/doc/pick_place/pick_place_tutorial.html
- The code below will be modified when deploying on real-robot
_check_add_types was canceled in add_final() and add() of YARR/yarr/replay_buffer/uniform_replay_buffer.py
- record after waiting for a second since the real-time capablity of subscribing 'joint_states' topic by ROS is inperfect
- check if the number of collected frames and thoes acquired by keyframe dicovery are the same
- if any error of tf tree occurs, re-publish calibration info and then re-collecting.
- To make full use of depth information, increase the layering of the object surface by labeling, using different colors, try training to check the voxelization, etc.
- collect demos in scene bounds and proper observing range of cameras
- accurate changing of the gripper status, or change the gripper status with slight movement ahead.
- vary in grasp pose, intial position/pose, scene layout, (disturbance, color, size)?
- goto intial_pose and open the gripper before collecting one demo
-
choose proper scene bounding box for different tasks
-
method.keypoint_method='all' instead of 'heuristic' to simply recognize all the actions as key actions
-
do not visualise the second param (q_attention) of visualise_voxel() to avoid strange red voxels
-
must finetune grip_and_rot
- Goto initial state at first,
- May try to visualise the second param (q_attention) of visualise_voxel() abserve the q predictions in the 3D space
- Things to record when testing
- the voxelization results at each step, for paper
- the videos of tasks for each tasks, for supply materials
- save the test dataset at the same time for later voxelization, re-set the joystick bottons
- The recordings should present the scene & target variance, and may show the other capabilities including close-loop control, human disturbance, different initial 7D pose, and collision avoidance when execution
- Maybe screenshot the trimesh.scene.show and link the videos with trimesh.scene.show
- modify the saving directoty of voxelization .pngs at visualise_voxel() of peract/helpers/utils.py
- the value range of mecheye's depth is 200-4000. The farther pixels have higher pixel values
- the value range of realsense and kinect's depth is 0 - 65535 (2**16-1). As for realsense, The farther pixels have higher pixel values.
- the depth images of RLBench is tranfer in an RGB images within 0-255, 3 channels, relative depth (near-far)
Steps to collect depth images: Sample depth images: episode94 set the proper near and far value of the camera clip and normalization to 0-1 fill in the pixels with the value of 0
- image resolution
utils.float_array_to_rgb_image(img_real_norm, 2**24-1)
- sudo apt-get install ros--joy
- if any error of lacking of Ur_move.h happens when catkin_make or catkin build, comment away the "add_executable" in CMaKeList.txt to catkin build the msg at fisrt, then catkin build the entire package
- roscore
- rosrun joy joy_node
- rostopic echo /joy
cd ~/rosproject/ur_joy/ source devel/setup.bash roslaunch learning_joy ur_joy.launch motor_control.launch and ur_motor_start.launch are for control the motor of guidewire
- adapt the direction of XY translation
- change the speed of XY translation
- the translation limitation is 1m
- the rotation bottons controls the rotations w.r.t. the TCP, i.e., roll, pitch, yaw
- change the rotation limitation
- joy pakcage and joy_node: https://index.ros.org/p/joy/
- set the functions in joy_get.py for every bottons and axes on the joy (our case uses 5.3 Microsoft Xbox 360 Wired Controller for Linux): http://wiki.ros.org/joy
Table of index number of /joy.buttons:
Index Button name on the actual controller
0 A
1 B
2 X
3 Y
4 LB
5 RB
6 back # open gripper; visualize the prediction at first
7 start # close gripper; execute
8 power # record;
6+7 #; set new language goals when execution
9 Button stick left
10 Button stick right
Table of index number of /joy.axes:
Index Axis name on the actual controller
0 Left/Right Axis stick left
1 Up/Down Axis stick left
2 LT
3 Left/Right Axis stick right
4 Up/Down Axis stick right
5 RT
6 cross key left/right
7 cross key up/down
