-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Description
System Info
- lerobot version: 0.3.4
- Platform: macOS-15.6.1-arm64-arm-64bit
- Python version: 3.10.13
- Huggingface Hub version: 0.34.3
- Datasets version: 4.1.1
- Numpy version: 2.2.6
- PyTorch version: 2.7.1
- Is PyTorch built with CUDA support?: False
- Cuda version: N/A
- GPU model: N/A
Information
- One of the scripts in the examples/ folder of LeRobot
- My own task or dataset (give details below)
Reproduction
Say you want to run inference for one of the most common models supported by the library (i.e., ACT). Ideally, you should be able to do something like:
obs = robot.get_observation()
obs = preprocess(obs) # for pipeline operations
action = model(obs)
action = postprocess(obs) # to denormalize the action into actual joints
robot.send_action(action)
However, this is not possible, due to crucial operation interfacing the output of robot.get_observation()
and the model forward (which, FYI, pipeline is mostly transparent too, perhaps indicating the need to scrutinize how bugs can be catched before forwarding?). Indeed, the outputs of robot.get_observation() don't match the (1) names and (2) types expected by the model, being (1) stored in a flat dictionary since #777 and (2) as arrays (while the model expects tensors).
Crucially, this prevents neat API examples as the model and observation need to be interfaced!
This issue does not arise when running inference through the command line command lerobot-record
because (scaffolding)
lerobot-record
usespredict_action
from the control utilspredict_action
implements a normalization step as in https://github.com/huggingface/lerobot/blob/6c28ef894af215bfaaf665aa3015c5645d91e53f/src/lerobot/utils/control_utils.py#L105:L114
Thus, one can exclusively run inference through lerobot-record and not independently. I would argue this is a problem, particularly considering how low-effort the fix to facilitate both CLI and API based control is. Consider that API based control is the way to go to display the library's functionality (particularly considering how fairly intricate the CLI interface is).
Expected behavior
See above ⬆️