Skip to content

Commit 7758853

Browse files
authored
Feat: Multi-Input Support (#330)
1 parent 206f063 commit 7758853

30 files changed

+1011
-392
lines changed

luxonis_ml/data/README.md

Lines changed: 42 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -106,16 +106,49 @@ After creating a dataset, the next step is to populate it with images and their
106106

107107
#### Data Format
108108

109-
Each data entry should be a dictionary with the following structure:
109+
Each data entry should be a dictionary with one of the following structures, depending on whether you're using a single input or multiple inputs:
110+
111+
##### Single-Input Format
110112

111113
```python
112114
{
113115
"file": str, # path to the image file
114-
"task_name": Optional[str], # task type for this annotation
116+
"task_name": Optional[str], # task for this annotation
115117
"annotation": Optional[dict] # annotation of the instance in the file
116118
}
117119
```
118120

121+
##### Multi-Input Format
122+
123+
```python
124+
{
125+
"files": dict[str, str], # mapping from input source name to file path
126+
"task_name": Optional[str], # task for this annotation
127+
"annotation": Optional[dict] # annotation of the instance in the files
128+
}
129+
```
130+
131+
In the multi-input format, the keys in the `files` dictionary are arbitrary strings that describe the role or modality of the input (e.g., `img_rgb`, `img_ir`, `depth`, etc.). These keys are later used to retrieve the corresponding images during data loading.
132+
133+
```python
134+
{
135+
"files": {
136+
"img_rgb": "path/to/rgb_image.png",
137+
"img_ir": "path/to/infrared_image.png"
138+
},
139+
"task_name": "detection",
140+
"annotation": {
141+
"class": "person",
142+
"boundingbox": {
143+
"x": 0.1,
144+
"y": 0.1,
145+
"w": 0.3,
146+
"h": 0.4
147+
}
148+
}
149+
}
150+
```
151+
119152
Luxonis Data Format supports **annotations optionally structured into different tasks** for improved organization. Tasks can be explicitly named or left unset - if none are specified, all annotations will be grouped under a single `task_name` set by default to `""` . The [example below](#adding-data-with-a-generator-function) demonstrates this with instance keypoints and segmentation tasks.
120153

121154
The content of the `"annotation"` field depends on the task type and follows the [Annotation Format](#annotation-format) described later in this document.
@@ -558,33 +591,29 @@ The mask is a binary 2D numpy array.
558591

559592
#### Run-Length Encoding
560593

561-
The mask is described using the [Run-Length Encoding](https://en.wikipedia.org/wiki/Run-length_encoding) compression.
594+
The mask is represented using [Run-Length Encoding (RLE)](https://en.wikipedia.org/wiki/Run-length_encoding), a lossless compression method that stores alternating counts of background and foreground pixels in **row-major order**, beginning from the top-left pixel. The first count always represents background pixels, even if that count is 0.
562595

563-
Run-length encoding compresses data by reducing the physical size
564-
of a repeating string of characters.
565-
This process involves converting the input data into a compressed format
566-
by identifying and counting consecutive occurrences of each character.
567-
568-
The RLE is composed of the height and width of the mask image and the counts of the pixels belonging to the positive class.
596+
The `counts` field contains either a **compressed byte string** or an **uncompressed list of integers**. We use the **COCO RLE format** via the `pycocotools` library to encode and decode masks.
569597

570598
```python
571599
{
572600
# name of the class this mask belongs to
573601
"class": str,
574602

575-
"segmentation":
576-
{
603+
"segmentation": {
577604
# height of the mask
578605
"height": int,
579606

580607
# width of the mask
581608
"width": int,
582609

583-
# counts of the pixels belonging to the positive class
610+
# run-length encoded pixel counts in row-major order,
611+
# starting with background. Can be a list[int] (uncompressed)
612+
# or a compressed byte string
584613
"counts": list[int] | bytes,
585614
},
586-
587615
}
616+
588617
```
589618

590619
> \[!NOTE\]

luxonis_ml/data/__main__.py

Lines changed: 78 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -107,21 +107,24 @@ def print_info(dataset: LuxonisDataset) -> None:
107107
task_table.add_row(", ".join(task_types))
108108

109109
splits = dataset.get_splits()
110+
source_names = dataset.get_source_names()
110111

111112
@group()
112113
def get_sizes_panel() -> Iterator[RenderableType]:
113114
if splits is not None:
114-
total_files = len(dataset)
115-
for split, files in splits.items():
116-
split_size = len(files)
115+
total_groups = len(dataset) / len(source_names)
116+
for split, group in splits.items():
117+
split_size = len(group)
117118
percentage = (
118-
(split_size / total_files * 100) if total_files > 0 else 0
119+
(split_size / total_groups * 100)
120+
if total_groups > 0
121+
else 0
119122
)
120123
yield f"[magenta b]{split}: [not b cyan]{split_size:,} [dim]({percentage:.1f}%)[/dim]"
121124
else:
122125
yield "[red]No splits found"
123126
yield Rule()
124-
yield f"[magenta b]Total: [not b cyan]{len(dataset)}"
127+
yield f"[magenta b]Total: [not b cyan]{int(total_groups)}"
125128

126129
@group()
127130
def get_panels() -> Iterator[RenderableType]:
@@ -188,11 +191,13 @@ def delete(
188191
):
189192
raise typer.Exit
190193

191-
dataset = LuxonisDataset(name, bucket_storage=bucket_storage)
192-
dataset.delete_dataset(
194+
dataset = LuxonisDataset(
195+
name,
196+
bucket_storage=bucket_storage,
193197
delete_local=local,
194198
delete_remote=remote,
195199
)
200+
dataset.delete_dataset(delete_local=local)
196201

197202
print(
198203
f"Dataset '{name}' deleted from: "
@@ -343,7 +348,14 @@ def inspect(
343348
)
344349

345350
if aug_config is not None:
346-
h, w, _ = loader[0][0].shape
351+
sample_img = loader[0][0]
352+
img = (
353+
next(iter(sample_img.values()))
354+
if isinstance(sample_img, dict)
355+
else sample_img
356+
)
357+
h, w = img.shape[:2]
358+
347359
loader.augmentations = loader._init_augmentations(
348360
augmentation_engine="albumentations",
349361
augmentation_config=aug_config,
@@ -357,13 +369,18 @@ def inspect(
357369
raise ValueError(f"Dataset '{name}' is empty.")
358370

359371
classes = dataset.get_classes()
360-
for image, labels in loader:
361-
image = image.astype(np.uint8)
362-
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
372+
prev_windows = set()
373+
374+
for img, labels in loader:
375+
if isinstance(img, dict):
376+
images_dict = img
377+
else:
378+
images_dict = {"image": img}
379+
380+
current_windows = set(images_dict.keys())
381+
for stale_window in prev_windows - current_windows:
382+
cv2.destroyWindow(stale_window)
363383

364-
h, w, _ = image.shape
365-
new_h, new_w = int(h * size_multiplier), int(w * size_multiplier)
366-
image = cv2.resize(image, (new_w, new_h))
367384
instance_keys = [
368385
"/boundingbox",
369386
"/keypoints",
@@ -372,35 +389,54 @@ def inspect(
372389
matched_instance_keys = [
373390
k for k in labels if any(k.endswith(ik) for ik in instance_keys)
374391
]
375-
if per_instance and matched_instance_keys:
376-
extra_keys = [k for k in labels if k not in matched_instance_keys]
377-
if extra_keys:
378-
print(
379-
f"[yellow]Warning: Ignoring non-instance keys in labels: {extra_keys}[/yellow]"
380-
)
381-
n_instances = len(labels[matched_instance_keys[0]])
382-
for i in range(n_instances):
383-
instance_labels = {
384-
k: np.expand_dims(v[i], axis=0)
385-
for k, v in labels.items()
386-
if k in matched_instance_keys and len(v) > i
387-
}
388-
instance_image = visualize(
389-
image.copy(), instance_labels, classes, blend_all=blend_all
390-
)
391-
cv2.imshow("image", instance_image)
392-
if cv2.waitKey() == ord("q"):
393-
break
394-
else:
395-
if per_instance:
396-
print(
397-
"[yellow]Warning: Per-instance mode is not supported for this dataset. "
398-
"Showing all labels in one window.[/yellow]"
392+
393+
for source_name, image in images_dict.items():
394+
image = image.astype(np.uint8)
395+
image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
396+
h, w = image.shape[:2]
397+
new_h, new_w = int(h * size_multiplier), int(w * size_multiplier)
398+
image = cv2.resize(image, (new_w, new_h))
399+
400+
if per_instance and matched_instance_keys:
401+
extra_keys = [
402+
k for k in labels if k not in matched_instance_keys
403+
]
404+
if extra_keys:
405+
print(
406+
f"[yellow]Warning: Ignoring non-instance keys in labels: {extra_keys}[/yellow]"
407+
)
408+
n_instances = len(labels[matched_instance_keys[0]])
409+
for i in range(n_instances):
410+
instance_labels = {
411+
k: np.expand_dims(v[i], axis=0)
412+
for k, v in labels.items()
413+
if k in matched_instance_keys and len(v) > i
414+
}
415+
instance_image = visualize(
416+
image.copy(),
417+
source_name,
418+
instance_labels,
419+
classes,
420+
blend_all=blend_all,
421+
)
422+
cv2.imshow(source_name, instance_image)
423+
if cv2.waitKey() == ord("q"):
424+
break
425+
else:
426+
if per_instance:
427+
print(
428+
"[yellow]Warning: Per-instance mode is not supported for this dataset. "
429+
f"Showing all labels in one window for '{source_name}'.[/yellow]"
430+
)
431+
labeled_image = visualize(
432+
image, source_name, labels, classes, blend_all=blend_all
399433
)
400-
image = visualize(image, labels, classes, blend_all=blend_all)
401-
cv2.imshow("image", image)
402-
if cv2.waitKey() == ord("q"):
403-
break
434+
cv2.imshow(source_name, labeled_image)
435+
436+
prev_windows = current_windows
437+
438+
if cv2.waitKey() == ord("q"):
439+
break
404440

405441

406442
@app.command()

luxonis_ml/data/augmentations/README.md

Lines changed: 66 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,72 @@
22

33
## `AlbumentationsEngine`
44

5-
The default engine used with `LuxonisLoader`. It is powered by the [Albumentations](https://albumentations.ai/) library and should be satisfactory for most use cases. Apart from the albumentations transformations, it also supports custom transformations registered in the `TRANSFORMATIONS` registry.
5+
The default engine used with `LuxonisLoader`. It is powered by the [Albumentations](https://albumentations.ai/) library and should be satisfactory for most use cases. In addition to the built-in Albumentations transformations, it also supports custom transformations registered in the `TRANSFORMATIONS` registry.
6+
7+
### Creating and Registering a Custom Augmentation
8+
9+
The process of creating custom augmentations follows the same principles as described in the [Albumentations custom transform guide](https://albumentations.ai/docs/4-advanced-guides/creating-custom-transforms/#creating-custom-albumentations-transforms). You can subclass from their base classes such as `DualTransform`, `ImageOnlyTransform`, or others depending on the target types you want to support.
10+
11+
The example below shows how to define, register, and use a custom transform:
12+
13+
```python
14+
import numpy as np
15+
from typing import Any, Sequence
16+
from albumentations import DualTransform
17+
18+
from luxonis_ml.data import LuxonisDataset, LuxonisLoader
19+
from luxonis_ml.data.augmentations.custom import TRANSFORMATIONS
20+
21+
class CustomTransform(DualTransform):
22+
def __init__(self, p: float = 1.0):
23+
super().__init__(p)
24+
25+
def apply(self, image: np.ndarray, **_: Any) -> np.ndarray:
26+
return image
27+
28+
def apply_to_mask(self, mask: np.ndarray, **_: Any) -> np.ndarray:
29+
return mask
30+
31+
def apply_to_bboxes(self, bboxes: Sequence[Any], **_: Any) -> Sequence[Any]:
32+
return bboxes
33+
34+
def apply_to_keypoints(self, keypoints: Sequence[Any], **_: Any) -> Sequence[Any]:
35+
return keypoints
36+
37+
# Register the transform
38+
TRANSFORMATIONS.register(module=CustomTransform)
39+
40+
# Use it in the config
41+
augmentation_config = [{
42+
"name": "CustomTransform",
43+
"params": {"p": 1},
44+
}]
45+
46+
loader = LuxonisLoader(
47+
LuxonisDataset("coco_test"),
48+
augmentation_config=augmentation_config,
49+
view="train",
50+
height=640,
51+
width=640,
52+
)
53+
54+
for data in loader:
55+
pass
56+
```
57+
58+
### Examples of Custom Augmentations
59+
60+
- [`letterbox_resize.py`](./custom/letterbox_resize.py)
61+
- [`symetric_keypoints_flip.py`](./custom/symetric_keypoints_flip.py)
62+
63+
### Batch-Level Augmentations
64+
65+
We also support **batch-level transformations**, built on top of the `BatchTransform` base class. These follow the same creation and registration pattern as standard custom transforms but operate on batches of data. This allows you to construct augmentations that combine multiple images and labels.
66+
67+
Examples:
68+
69+
- [`mosaic.py`](./custom/mosaic.py)
70+
- [`mixup.py`](./custom/mixup.py)
671

772
### Configuration Format
873

0 commit comments

Comments
 (0)