Skip to content

Conversation

fracapuano
Copy link
Collaborator

@fracapuano fracapuano commented Oct 8, 2025

What this does

Solves #2142, introducing an encapsulator to interface robot observations and models build_inference_frame.

Crucially, this PR enables a block of code like the following to run without any issues:

obs = robot.get_observation()
obs_frame = build_inference_frame(obs, dataset_metadata.features, device)

obs = preprocess(obs_frame)

action = model.select_action(obs)

action = make_robot_action(action)
action = postprocess(action)
robot.send_action(action)

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the preprocessing steps for policy inference by introducing a new build_inference_frame function that encapsulates the conversion of robot observations into the format expected by machine learning models. This addresses issue #2142 by providing a clean interface between robot observations and model inference.

Key changes:

  • Added build_inference_frame function to centralize observation preprocessing logic
  • Replaced inline preprocessing code in predict_action with a call to the new function
  • The new function handles tensor conversion, image normalization, device transfer, and metadata addition

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
src/lerobot/policies/utils.py Adds the new build_inference_frame function with observation preprocessing logic
src/lerobot/utils/control_utils.py Replaces inline preprocessing with call to build_inference_frame function

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.



def build_inference_frame(
observation: dict[str, torch.Tensor],
Copy link

Copilot AI Oct 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type annotation suggests the input observation is already torch.Tensor, but line 107 calls torch.from_numpy() indicating the input should be numpy arrays. The type annotation should be dict[str, np.ndarray] to accurately reflect the expected input type.

Copilot uses AI. Check for mistakes.


observation["task"] = task if task else ""
observation["robot_type"] = robot_type if robot_type else ""
observation = build_inference_frame(observation, device)
Copy link

Copilot AI Oct 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function call is missing the required ds_features parameter. Based on the function signature in utils.py, it should be build_inference_frame(observation, ds_features, device) where ds_features likely comes from dataset_metadata.features.

Copilot uses AI. Check for mistakes.

@fracapuano fracapuano changed the title Fix/encapsulate preprocessing steps for policy inference (improve api) Add the Build-Inference-Frame Util to Allow API-based Inference Oct 8, 2025
@imstevenpmwork imstevenpmwork added enhancement Suggestions for new features or improvements policies Items related to robot policies processor Issue related to processor labels Oct 8, 2025
Copy link
Collaborator

@imstevenpmwork imstevenpmwork left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the linked issue and this PR's goal. However, instead of just refactoring into a function, I suggest a more robust solution: implement the linked code as a formal processor step within the policy's preprocessor.

This would make the main control loop much cleaner:

obs = robot.get_observation()
obs = preprocess(obs)  # Now includes the new step
action = model.select_action(obs)
action = postprocess(action)
# robot.send_action(action)

This approach also addresses the root cause mentioned in point # 7 of the previous review: the flawed assumptions in batch_to_transition. By moving this logic to a processor, we make progress on point #1 of that same comment, which is to migrate policy boilerplate into proper processor steps.

Shall we proceed with this processor-based approach, or would you prefer to stick with the current PR's function-based refactor? (I'm good either way)

Thanks anyways for taking a look into this !

@imstevenpmwork imstevenpmwork linked an issue Oct 8, 2025 that may be closed by this pull request
2 tasks
@imstevenpmwork imstevenpmwork changed the title (improve api) Add the Build-Inference-Frame Util to Allow API-based Inference feat(scripts): Introduce build_inference_frame util to easily allow API-based Inference Oct 8, 2025
@fracapuano
Copy link
Collaborator Author

Shall we proceed with this processor-based approach, or would you prefer to stick with the current PR's function-based refactor? (I'm good either way)

I would stick to the current PR and open a "good first issue" for the community to implement this as part of the pipeline system. The main rationale for this choice is that it is the solution which would allow us to ship the tutorial the earliest (i.e., tomorrow) without delaying release further.

Also, I think there is a rather big CI blocker for the pipeline system: the need to be constantly updating all the models on the hub whenever we change the pipeline used, which is why I'd argue we should choose a bit more carefully when to roll out changes to the pipelines already uploaded to avoid disruptions. For instance, I think updates to the pipelines used by the model (such as the one you're describing here) could be introduced within a future 0.x.0 release. Happy to discuss this further tho!

@fracapuano
Copy link
Collaborator Author

fracapuano commented Oct 9, 2025

@imstevenpmwork just flagging that 06654e1 adds a very similar modification on the policy's outputs too

I added this modification to this PR just to move fast. They are conceptually deriving from the same problem

@fracapuano fracapuano changed the title feat(scripts): Introduce build_inference_frame util to easily allow API-based Inference feat(scripts): Introduce build_inference_frame/make_robot_action util to easily allow API-based Inference Oct 9, 2025
@fracapuano fracapuano removed the processor Issue related to processor label Oct 10, 2025
fracapuano and others added 2 commits October 10, 2025 16:22
…on to only perform data type handling (whole conversion is: keys matching + data type conversion)
@fracapuano
Copy link
Collaborator Author

Hey @imstevenpmwork 👋 I wanted to let you know (1) I succesfully tested these last changes using the CLI lerobot-record and (2) I think I had a (tiny) bug in record_loop, and now I have fixed it.

As per (2), I think I unnecessarily called build_dataset_frame twice on the same, already converted object. Not a big deal, but I don't think is ideal either. Here are my thoughts:

  1. Within record_loop, we need to convert a raw observation into a frame, either for inference or to append it to a dataset. At a high-level, converting a raw observation {motor1: ..., motor2: ..., ..., camera1: ...} into a frame {observation.state: ..., observation.images.camera1: ...} consists of (1) turning a dictionary into another dictionary and (2) turning arrays into tensors with specific data types.
  2. We can do both things using build_inference_frame. However, in lerobot-record the code's logic is such that we need to split these two operations. In particular, you can see that we first convert the raw observation into a useful frame here:
    observation_frame = build_dataset_frame(dataset.features, obs_processed, prefix=OBS_STR)
  3. Then, we call predict_action onto the frame:
    action_values = predict_action(
    observation=observation_frame,
    policy=policy,
    device=get_safe_torch_device(policy.config.device),
    preprocessor=preprocessor,
    postprocessor=postprocessor,
    use_amp=policy.config.use_amp,
  4. Inside of predict_action, we don't need to reconvert the frame again, and only need to take care of changing the data types accordingly. This means that we need a third util which encapsulates the conversion logic currently implemented on main, i.e. (1) handling the image dtypes int -> float conversion and turning tensors into arrays where necessary.

The new small function build_frame_for_inference thus groups these two operations so that clean API examples can just interface build_frame_for_inference (which under the hood calls build_dataset_frame for dict->dict operations and prepare_frame_for_inference for data type handling)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Suggestions for new features or improvements policies Items related to robot policies

Projects

None yet

Development

Successfully merging this pull request may close these issues.

(improving api) Over-reliance on lerobot-record for inference

2 participants