Fix Multi File Aggregate Datasets #2126

nicholas-maselli · 2025-10-06T20:59:00Z

There is a bug in the src\lerobot\datasets\aggregate.py file where the latest_duration key is being updated incorrectly which is causing training for the newly aggregated datasets to fail.

This occurs when aggregating 3+ datasets but was also occuring when video data was exceeding the "DEFAULT_VIDEO_FILE_SIZE_IN_MB" size.

This PR fixes this by removing the unused episode duration and properly updating the latest_duration key data by setting it equal to the timestamps_shift_s instead of adding the ever increasing timestamps_shift_s variable.

We have also set the latest_duration to 0 when rotating to a new file / chunk as the episode metadata parquet file requires the timestamp for a new file to begin at 0.

The existing tests pass here

pytest tests/datasets/test_aggregate.py

A robust way to confirm this fix works is to get 3 datasets and combine them and test to ensure the standard training script trains properly.

A second test would be to get datasets such that the video data pushes past the DEFAULT_VIDEO_FILE_SIZE_IN_MB variable

michel-aractingi · 2025-10-07T08:56:20Z

Hey @nicholas-maselli you're right there is a bug in aggregate and we have a fix similar to what you do in this PR #2100

…y starts at 0 but the frames after go back to starting at large numbers (rather then properly offset by the total episode duration

nicholas-maselli · 2025-10-08T06:03:31Z

Hey @nicholas-maselli you're right there is a bug in aggregate and we have a fix similar to what you do in this PR #2100

Oh excellent! I actually just pushed an additional fix here that fixes another rotating episode bug.

Do you need help with any dataset tools? I would love to help if there are any timelines for the release. I have several extremely large datasets I can test all your tools on if you would like =)

michel-aractingi · 2025-10-08T20:34:10Z

That would be great @nicholas-maselli ! We're planning to release it tomorrow but still it would be great if you test it and report or push any fixes that you find.

fix aggregate datasets with latest duration changes

749a921

pkooij requested a review from michel-aractingi October 7, 2025 07:44

pkooij assigned nicholas-maselli Oct 7, 2025

pkooij added the dataset Issues regarding data inputs, processing, or datasets label Oct 7, 2025

fixing the rotating epiosde bug where the first epiosde back correctl…

2d020a9

…y starts at 0 but the frames after go back to starting at large numbers (rather then properly offset by the total episode duration

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix Multi File Aggregate Datasets #2126

Fix Multi File Aggregate Datasets #2126

nicholas-maselli commented Oct 6, 2025

Uh oh!

michel-aractingi commented Oct 7, 2025

Uh oh!

nicholas-maselli commented Oct 8, 2025 •

edited

Loading

Uh oh!

michel-aractingi commented Oct 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix Multi File Aggregate Datasets #2126

Are you sure you want to change the base?

Fix Multi File Aggregate Datasets #2126

Conversation

nicholas-maselli commented Oct 6, 2025

Uh oh!

michel-aractingi commented Oct 7, 2025

Uh oh!

nicholas-maselli commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

michel-aractingi commented Oct 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nicholas-maselli commented Oct 8, 2025 •

edited

Loading