Skip to content

Conversation

@dtronmans
Copy link
Contributor

Purpose

Halves the parsing time for COCO and Native parsers

Specification

  • Path(k).resolve() performs full path normalization + symlink resolution + filesystem access, which was one of the bottlenecks in parsing. Replacing it by Path(k).absolute() instead more than halves the parsing time.
  • Different datasets were created artificially with 30, 300, 3000 and 30,000 images of COCO_people_subset, and it was found that parsing speed always scales linearly with the number of annotations.
  • Turning every annotation into an instance of DatasetRecord takes up 99% of the parsing time, so any further improvement should be based on this

Dependencies & Potential Impact

Symlinks for images in the dataset are no longer supported and a note is written about this in the README.md

Deployment Plan

Testing & Validation

  • Test out the parsing speed on NATIVE and COCO for different dataset sizes with Path(k).resolve() and Path(k).absolute(). For example, for 30,000 images, NATIVE parser with Path(k).resolve() takes 654 seconds whereas Path(k).absolute() takes 273 seconds.

@dtronmans dtronmans requested a review from a team as a code owner November 28, 2025 11:53
@dtronmans dtronmans requested review from conorsim, klemen1999, kozlov721 and tersekmatija and removed request for a team November 28, 2025 11:53
@github-actions github-actions bot added documentation Improvements or additions to documentation enhancement New feature or request data Changes affecting luxonis_ml.data subpackage labels Nov 28, 2025
@dtronmans dtronmans removed the documentation Improvements or additions to documentation label Nov 28, 2025
@dtronmans dtronmans merged commit c504ef5 into main Nov 29, 2025
13 checks passed
@dtronmans dtronmans deleted the feat/parsing-optimization branch November 29, 2025 16:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data Changes affecting luxonis_ml.data subpackage enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants