luxonis
diff --git a/‎luxonis_ml/data/README.md‎
Lines changed: 42 additions & 13 deletions b/‎luxonis_ml/data/README.md‎
Lines changed: 42 additions & 13 deletions
diff --git a/‎luxonis_ml/data/__main__.py‎
Lines changed: 78 additions & 42 deletions b/‎luxonis_ml/data/__main__.py‎
Lines changed: 78 additions & 42 deletions
diff --git a/‎luxonis_ml/data/augmentations/README.md‎
Lines changed: 66 additions & 1 deletion b/‎luxonis_ml/data/augmentations/README.md‎
Lines changed: 66 additions & 1 deletion
@@ -106,16 +106,49 @@ After creating a dataset, the next step is to populate it with images and their
 
 #### Data Format
 
-Each data entry should be a dictionary with the following structure:
+Each data entry should be a dictionary with one of the following structures, depending on whether you're using a single input or multiple inputs:
+
+##### Single-Input Format
 
 ```python
 {
     "file": str,  # path to the image file
-    "task_name": Optional[str], # task type for this annotation
+    "task_name": Optional[str],  # task for this annotation
     "annotation": Optional[dict]  # annotation of the instance in the file
 }
 ```
 
+##### Multi-Input Format
+
+```python
+{
+    "files": dict[str, str],  # mapping from input source name to file path
+    "task_name": Optional[str],  # task for this annotation
+    "annotation": Optional[dict]  # annotation of the instance in the files
+}
+```
+
+In the multi-input format, the keys in the `files` dictionary are arbitrary strings that describe the role or modality of the input (e.g., `img_rgb`, `img_ir`, `depth`, etc.). These keys are later used to retrieve the corresponding images during data loading.
+
+```python
+{
+    "files": {
+        "img_rgb": "path/to/rgb_image.png",
+        "img_ir": "path/to/infrared_image.png"
+    },
+    "task_name": "detection",
+    "annotation": {
+        "class": "person",
+        "boundingbox": {
+            "x": 0.1,
+            "y": 0.1,
+            "w": 0.3,
+            "h": 0.4
+        }
+    }
+}
+```
+
 Luxonis Data Format supports **annotations optionally structured into different tasks** for improved organization. Tasks can be explicitly named or left unset - if none are specified, all annotations will be grouped under a single `task_name` set by default to `""` . The [example below](#adding-data-with-a-generator-function) demonstrates this with instance keypoints and segmentation tasks.
 
 The content of the `"annotation"` field depends on the task type and follows the [Annotation Format](#annotation-format) described later in this document.
@@ -558,33 +591,29 @@ The mask is a binary 2D numpy array.
 
 #### Run-Length Encoding
 
-The mask is described using the [Run-Length Encoding](https://en.wikipedia.org/wiki/Run-length_encoding) compression.
+The mask is represented using [Run-Length Encoding (RLE)](https://en.wikipedia.org/wiki/Run-length_encoding), a lossless compression method that stores alternating counts of background and foreground pixels in **row-major order**, beginning from the top-left pixel. The first count always represents background pixels, even if that count is 0.
 
-Run-length encoding compresses data by reducing the physical size
-of a repeating string of characters.
-This process involves converting the input data into a compressed format
-by identifying and counting consecutive occurrences of each character.
-
-The RLE is composed of the height and width of the mask image and the counts of the pixels belonging to the positive class.
+The `counts` field contains either a **compressed byte string** or an **uncompressed list of integers**. We use the **COCO RLE format** via the `pycocotools` library to encode and decode masks.
 
 ```python
 {
     # name of the class this mask belongs to
     "class": str,
 
-    "segmentation":
-    {
+    "segmentation": {
         # height of the mask
         "height": int,
 
         # width of the mask
         "width": int,
 
-        # counts of the pixels belonging to the positive class
+        # run-length encoded pixel counts in row-major order,
+        # starting with background. Can be a list[int] (uncompressed)
+        # or a compressed byte string
         "counts": list[int] | bytes,
     },
-
 }
+
 ```
 
 > \[!NOTE\]
 
@@ -107,21 +107,24 @@ def print_info(dataset: LuxonisDataset) -> None:
             task_table.add_row(", ".join(task_types))
 
     splits = dataset.get_splits()
+    source_names = dataset.get_source_names()
 
     @group()
     def get_sizes_panel() -> Iterator[RenderableType]:
         if splits is not None:
-            total_files = len(dataset)
-            for split, files in splits.items():
-                split_size = len(files)
+            total_groups = len(dataset) / len(source_names)
+            for split, group in splits.items():
+                split_size = len(group)
                 percentage = (
-                    (split_size / total_files * 100) if total_files > 0 else 0
+                    (split_size / total_groups * 100)
+                    if total_groups > 0
+                    else 0
                 )
                 yield f"[magenta b]{split}: [not b cyan]{split_size:,} [dim]({percentage:.1f}%)[/dim]"
         else:
             yield "[red]No splits found"
         yield Rule()
-        yield f"[magenta b]Total: [not b cyan]{len(dataset)}"
+        yield f"[magenta b]Total: [not b cyan]{int(total_groups)}"
 
     @group()
     def get_panels() -> Iterator[RenderableType]:
@@ -188,11 +191,13 @@ def delete(
     ):
         raise typer.Exit
 
-    dataset = LuxonisDataset(name, bucket_storage=bucket_storage)
-    dataset.delete_dataset(
+    dataset = LuxonisDataset(
+        name,
+        bucket_storage=bucket_storage,
         delete_local=local,
         delete_remote=remote,
     )
+    dataset.delete_dataset(delete_local=local)
 
     print(
         f"Dataset '{name}' deleted from: "
@@ -343,7 +348,14 @@ def inspect(
     )
 
     if aug_config is not None:
-        h, w, _ = loader[0][0].shape
+        sample_img = loader[0][0]
+        img = (
+            next(iter(sample_img.values()))
+            if isinstance(sample_img, dict)
+            else sample_img
+        )
+        h, w = img.shape[:2]
+
         loader.augmentations = loader._init_augmentations(
             augmentation_engine="albumentations",
             augmentation_config=aug_config,
@@ -357,13 +369,18 @@ def inspect(
         raise ValueError(f"Dataset '{name}' is empty.")
 
     classes = dataset.get_classes()
-    for image, labels in loader:
-        image = image.astype(np.uint8)
-        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
+    prev_windows = set()
+
+    for img, labels in loader:
+        if isinstance(img, dict):
+            images_dict = img
+        else:
+            images_dict = {"image": img}
+
+        current_windows = set(images_dict.keys())
+        for stale_window in prev_windows - current_windows:
+            cv2.destroyWindow(stale_window)
 
-        h, w, _ = image.shape
-        new_h, new_w = int(h * size_multiplier), int(w * size_multiplier)
-        image = cv2.resize(image, (new_w, new_h))
         instance_keys = [
             "/boundingbox",
             "/keypoints",
@@ -372,35 +389,54 @@ def inspect(
         matched_instance_keys = [
             k for k in labels if any(k.endswith(ik) for ik in instance_keys)
         ]
-        if per_instance and matched_instance_keys:
-            extra_keys = [k for k in labels if k not in matched_instance_keys]
-            if extra_keys:
-                print(
-                    f"[yellow]Warning: Ignoring non-instance keys in labels: {extra_keys}[/yellow]"
-                )
-            n_instances = len(labels[matched_instance_keys[0]])
-            for i in range(n_instances):
-                instance_labels = {
-                    k: np.expand_dims(v[i], axis=0)
-                    for k, v in labels.items()
-                    if k in matched_instance_keys and len(v) > i
-                }
-                instance_image = visualize(
-                    image.copy(), instance_labels, classes, blend_all=blend_all
-                )
-                cv2.imshow("image", instance_image)
-                if cv2.waitKey() == ord("q"):
-                    break
-        else:
-            if per_instance:
-                print(
-                    "[yellow]Warning: Per-instance mode is not supported for this dataset. "
-                    "Showing all labels in one window.[/yellow]"
+
+        for source_name, image in images_dict.items():
+            image = image.astype(np.uint8)
+            image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
+            h, w = image.shape[:2]
+            new_h, new_w = int(h * size_multiplier), int(w * size_multiplier)
+            image = cv2.resize(image, (new_w, new_h))
+
+            if per_instance and matched_instance_keys:
+                extra_keys = [
+                    k for k in labels if k not in matched_instance_keys
+                ]
+                if extra_keys:
+                    print(
+                        f"[yellow]Warning: Ignoring non-instance keys in labels: {extra_keys}[/yellow]"
+                    )
+                n_instances = len(labels[matched_instance_keys[0]])
+                for i in range(n_instances):
+                    instance_labels = {
+                        k: np.expand_dims(v[i], axis=0)
+                        for k, v in labels.items()
+                        if k in matched_instance_keys and len(v) > i
+                    }
+                    instance_image = visualize(
+                        image.copy(),
+                        source_name,
+                        instance_labels,
+                        classes,
+                        blend_all=blend_all,
+                    )
+                    cv2.imshow(source_name, instance_image)
+                    if cv2.waitKey() == ord("q"):
+                        break
+            else:
+                if per_instance:
+                    print(
+                        "[yellow]Warning: Per-instance mode is not supported for this dataset. "
+                        f"Showing all labels in one window for '{source_name}'.[/yellow]"
+                    )
+                labeled_image = visualize(
+                    image, source_name, labels, classes, blend_all=blend_all
                 )
-            image = visualize(image, labels, classes, blend_all=blend_all)
-            cv2.imshow("image", image)
-            if cv2.waitKey() == ord("q"):
-                break
+                cv2.imshow(source_name, labeled_image)
+
+        prev_windows = current_windows
+
+        if cv2.waitKey() == ord("q"):
+            break
 
 
 @app.command()
 
@@ -2,7 +2,72 @@
 
 ## `AlbumentationsEngine`
 
-The default engine used with `LuxonisLoader`. It is powered by the [Albumentations](https://albumentations.ai/) library and should be satisfactory for most use cases. Apart from the albumentations transformations, it also supports custom transformations registered in the `TRANSFORMATIONS` registry.
+The default engine used with `LuxonisLoader`. It is powered by the [Albumentations](https://albumentations.ai/) library and should be satisfactory for most use cases. In addition to the built-in Albumentations transformations, it also supports custom transformations registered in the `TRANSFORMATIONS` registry.
+
+### Creating and Registering a Custom Augmentation
+
+The process of creating custom augmentations follows the same principles as described in the [Albumentations custom transform guide](https://albumentations.ai/docs/4-advanced-guides/creating-custom-transforms/#creating-custom-albumentations-transforms). You can subclass from their base classes such as `DualTransform`, `ImageOnlyTransform`, or others depending on the target types you want to support.
+
+The example below shows how to define, register, and use a custom transform:
+
+```python
+import numpy as np
+from typing import Any, Sequence
+from albumentations import DualTransform
+
+from luxonis_ml.data import LuxonisDataset, LuxonisLoader
+from luxonis_ml.data.augmentations.custom import TRANSFORMATIONS
+
+class CustomTransform(DualTransform):
+    def __init__(self, p: float = 1.0):
+        super().__init__(p)
+
+    def apply(self, image: np.ndarray, **_: Any) -> np.ndarray:
+        return image
+
+    def apply_to_mask(self, mask: np.ndarray, **_: Any) -> np.ndarray:
+        return mask
+
+    def apply_to_bboxes(self, bboxes: Sequence[Any], **_: Any) -> Sequence[Any]:
+        return bboxes
+
+    def apply_to_keypoints(self, keypoints: Sequence[Any], **_: Any) -> Sequence[Any]:
+        return keypoints
+
+# Register the transform
+TRANSFORMATIONS.register(module=CustomTransform)
+
+# Use it in the config
+augmentation_config = [{
+    "name": "CustomTransform",
+    "params": {"p": 1},
+}]
+
+loader = LuxonisLoader(
+    LuxonisDataset("coco_test"),
+    augmentation_config=augmentation_config,
+    view="train",
+    height=640,
+    width=640,
+)
+
+for data in loader:
+  pass
+```
+
+### Examples of Custom Augmentations
+
+- [`letterbox_resize.py`](./custom/letterbox_resize.py)
+- [`symetric_keypoints_flip.py`](./custom/symetric_keypoints_flip.py)
+
+### Batch-Level Augmentations
+
+We also support **batch-level transformations**, built on top of the `BatchTransform` base class. These follow the same creation and registration pattern as standard custom transforms but operate on batches of data. This allows you to construct augmentations that combine multiple images and labels.
+
+Examples:
+
+- [`mosaic.py`](./custom/mosaic.py)
+- [`mixup.py`](./custom/mixup.py)
 
 ### Configuration Format