Skip to content

Conversation

@kkeroo
Copy link
Contributor

@kkeroo kkeroo commented Sep 5, 2025

Purpose

A lot of customers are using YOLOv8 or Ultralytics format for datasets. We need a parser for that.

Specification

YOLOv8 dataset format structure is like:

dataset_dir/
        ├── train/
        │   ├── images/
        │   │   ├── img1.jpg
        │   │   ├── img2.jpg
        │   │   └── ...
        │   ├── labels/
        │   │   ├── img1.txt
        │   │   ├── img2.txt
        │   │   └── ...
        ├── val/
        │   ├── images/
        │   │   ├── img1.jpg
        │   │   ├── img2.jpg
        │   │   └── ...
        │   ├── labels/
        │   │   ├── img1.txt
        │   │   ├── img2.txt
        │   │   └── ...
        ├── test/
        │   ├── images/
        │   │   ├── img1.jpg
        │   │   ├── img2.jpg
        │   │   └── ...
        │   ├── labels/
        │   │   ├── img1.txt
        │   │   ├── img2.txt
        │   │   └── ...
        └── *.yaml

while Ultralytics (Hub) format is like:

dataset_dir/
        ├── images/
        │   ├── train/
        │   │   ├── img1.jpg
        │   │   ├── img2.jpg
        │   │   └── ...
        │   ├── val/
        │   └── test/
        ├── labels/
        │   ├── train/
        │   │   ├── img1.txt
        │   │   ├── img2.txt
        │   │   └── ...
        │   ├── val/
        │   └── test/
        └── *.yaml

For example you can download coco8 from Ultralytics Github to get the Ultralytics format, or download any dataset from Roboflow Universe in YOLOv8 format.

This PR introduces all-in-one Ultralytics parser for parsing both formats and following tasks:

  • Object Detection
  • Instance Segmentation
  • Keypoints Detection

How we figure out the task? Object Detection is simple because all 3 tasks have at least 5 elements in each row of an annotation file. All 3 tasks have class_ix, x, y, w, h while the instance segmentation and keypoints have additional ones. If the dataset is keypoints-based then we look into the yaml file that is present in all ultralytics/yolov8 datasets and look for field kpt_shape which tells us the number and dimensionality of the keypoints (can be 2D or 3D). If it is 2D then we add visibility (3rd dim) to 2 (full visible). 3D kpts have 3rd dim already Literal[0,1,2]. So:

  • Object detection: len(elements) == 5
  • Keypoints: len(elements) > 5 and kpt_shape present in yaml file
  • Instance seg: else

Dependencies & Potential Impact

None / not applicable

Deployment Plan

None / not applicable

Testing & Validation

Tested on:

  • Coco8 (from ultralytics gh)
  • Coco8-pose (from ultralytics gh)
  • Tiger-pose (from ultralytics gh)
  • Coco8-seg ((from ultralytics gh)
  • crack-seg (from ultralytics gh)
  • Padel kpts dataset (from Roboflow)
  • Fire and Smoke instance seg. (from Roboflow)

@kkeroo kkeroo requested a review from a team as a code owner September 5, 2025 09:10
@kkeroo kkeroo requested review from conorsim, klemen1999, kozlov721 and tersekmatija and removed request for a team September 5, 2025 09:10
@github-actions github-actions bot added the enhancement New feature or request label Sep 5, 2025
@codecov
Copy link

codecov bot commented Sep 5, 2025

Codecov Report

❌ Patch coverage is 91.36691% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 95.30%. Comparing base (49a14a0) to head (5161e4d).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
luxonis_ml/data/parsers/yolov8_parser.py 91.17% 12 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##             main     #367    +/-   ##
========================================
  Coverage   95.29%   95.30%            
========================================
  Files         104      105     +1     
  Lines        6401     6540   +139     
========================================
+ Hits         6100     6233   +133     
- Misses        301      307     +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@klemen1999 klemen1999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inital integration looks good, left some comments.
Other comments:

  • Add tests
  • Add parser to the README and docs.luxonis after merge

@github-actions github-actions bot added documentation Improvements or additions to documentation data Changes affecting luxonis_ml.data subpackage labels Sep 8, 2025
@kkeroo
Copy link
Contributor Author

kkeroo commented Sep 8, 2025

Tests are added (3 datasets for obj. det, kpt. det., and inst. seg.), README updated, I moved the Ultralytics format up, right after COCO format because its probably very popular

@klemen1999 klemen1999 merged commit 571af92 into main Sep 8, 2025
35 of 42 checks passed
@klemen1999 klemen1999 deleted the feat/ultralytics-parser branch September 8, 2025 13:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data Changes affecting luxonis_ml.data subpackage documentation Improvements or additions to documentation enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants