Docs: Augmentations (#310)

kozlov721 · web-flow · commit 4e65d1ecb8a0 · 2025-04-28T11:36:08.000+02:00
diff --git a/luxonis_ml/data/README.md b/luxonis_ml/data/README.md
@@ -714,6 +714,8 @@ On top of that, we provide a handful of custom batch augmentations:
 - `Mosaic4` - Mosaic augmentation with 4 images. Combines crops of 4 images into a single image in a mosaic pattern.
 - `MixUp` - MixUp augmentation. Overlays two images with a random weight.
 
+To learn more in detail about the augmentations, see the [Augmentations documentation](./augmentations/README.md).
+
 ### Example
 
 The following example demonstrates a simple augmentation pipeline:
diff --git a/luxonis_ml/data/augmentations/README.md b/luxonis_ml/data/augmentations/README.md
@@ -0,0 +1,74 @@
+# Augmentations
+
+## `AlbumentationsEngine`
+
+The default engine used with `LuxonisLoader`. It is powered by the [Albumentations](https://albumentations.ai/) library and should be satisfactory for most use cases. Apart from the albumentations transformations, it also supports custom transformations registered in the `TRANSFORMATIONS` registry.
+
+### Configuration Format
+
+The configuration format for the `AlbumentationsEngine` consists of a list of records, where each record contain 2 fields; `name` and `params`:
+
+- `name`: The name of the transformation class to be applied (e.g., `HorizontalFlip`, `RandomCrop`, etc.). The name must be either a valid name of an Albumentations transformation (accessible under the `albumentations` namespace), or a name of a custom transformation registered in the `TRANSFORMATIONS` registry.
+- `params`: A dictionary of parameters to be passed to the transformation.
+- `use_for_resizing`: An optional boolean flag that indicates whether the transformation should be used for resizing. If no resizing augmentation is provided, the engine will use either `A.Resize` or `LetterboxResize` depending on the `keep_aspect_ratio` parameter (provided through the `LuxonisLoader`).
+
+**Example:**
+
+```yaml
+- name: Defocus
+  params:
+    p: 1
+- name: Sharpen
+  params:
+    p: 1
+- name: Affine
+  params:
+    p: 1
+- name: RandomCrop
+  params:
+    p: 1
+    width: 512
+    height: 512
+- name: Mosaic4
+  params:
+    p: 1.
+    out_width: 256
+    out_height: 256
+```
+
+### Order of Transformations
+
+The order of transformations provided in the configuration is not
+guaranteed to be preserved. The transformations are divided into
+the following groups and are applied in the same order:
+
+1. Batch transformations - Subclasses of our custom `BatchTransform`
+
+1. Spatial transformations - Subclasses of `A.DualTransform`
+
+1. Custom transformations - Subclasses of `A.BasicTransform`,
+   but not subclasses of more specific base classes above
+
+1. Pixel transformations: Subclasses of `A.ImageOnlyTransform`.
+   These transformations act only on the image
+
+The resize transformation is applied either before or after the pixel transformations, depending on desired output size. If the output size is smaller than the initial image size, the resize transformation is applied before the pixel transformations to save compute. In the other case it is applied last.
+
+## Extensibility
+
+`LuxonisLoader` can work with any subclass of `BaseEngine` that is registered in the `AUGMENTATION_ENGINES` registry.
+
+To implement a custom augmentation engine, you need to create a new class that inherits from `BaseEngine` and implements the required methods. Any subclass of `BaseEngine` is automatically registered in the aforementioned registry.
+
+**Required Methods:**
+
+- `__init__`: The constructor method that initializes the engine with the provided configuration. It needs to create a new instance of the augmentation engine from the following arguments:
+  - `height`: The output height of the images
+  - `width`: The output width of the images
+  - `n_classes`: The number of classes in the dataset
+  - `config`: The configuration for the augmentation engine as an iterable of dictionaries. Interpretation of the configuration is left to the engine (it doesn't need to follow the format used in `AlbumentationsEngine`).
+  - `keep_aspect_ratio`: A boolean flag that indicates whether to keep the aspect ratio of the images during resizing. The engine should respect this flag when applying the resizing transformation.
+  - `is_validation_pipeline`: A boolean flag that indicates whether the engine is being used for validation. Typically, the applied transformations differ between training and validation (for example validation pipeline would only use resizing and normalization).
+  - `targets`: A dictionary mapping names of individual labels (given in `apply`) to their respective label types. Possible values of the label types are `"boundingbox"`, `"segmentation"`, `"instance_segmentation"`, `"keypoints"`, `"array"`, `"classification"`, and `"metadata"`. Interpretation of the targets is left to the engine.
+- `apply`: This method applies the augmentation engine to the provided batch of images and labels. It should return a tuple containing the augmented images and labels. The method should also handle the resizing of images and targets according to the specified output size and aspect ratio.
+- `batch_size`: An abstract property that returns the expected batch size of the inputs. This is required for the `LuxonisLoader` to properly handle the input data for batched augmentations. For example if the pipeline contains `MixUp` augmentation (which requires 2 images) and `Mosaic4` (requiring 4 images), the batch size should be $$2 * 4 = 8$$.
diff --git a/luxonis_ml/data/augmentations/albumentations_engine.py b/luxonis_ml/data/augmentations/albumentations_engine.py
@@ -95,7 +95,7 @@ class AlbumentationsEngine(AugmentationEngine, register_name="albumentations"):
         2. spatial transformations: Subclasses of `A.DualTransform`.
 
         3. custom transformations: Subclasses of `A.BasicTransform`,
-            but not subclasses of any of more specific base classes above.
+            but not subclasses of more specific base classes above.
 
         4. pixel transformations: Subclasses of `A.ImageOnlyTransform`.
             These transformations act only on the image.
diff --git a/luxonis_ml/data/augmentations/base_engine.py b/luxonis_ml/data/augmentations/base_engine.py
@@ -9,6 +9,9 @@
 )
 
 
+# TODO: The engine should probably also handle normalization
+# so it doesn't have to be done by injecting a normalization
+# transformation to the config in LuxonisTrain.
 class AugmentationEngine(
     ABC,
     metaclass=AutoRegisterMeta,
diff --git a/luxonis_ml/data/loaders/README.md b/luxonis_ml/data/loaders/README.md
@@ -15,7 +15,7 @@ The `LuxonisLoader` class provides efficient access to dataset samples with conf
 | ----------------------------- | ----------------------------------------- | ------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
 | `dataset`                     | `LuxonisDataset`                          | Required           | The dataset to load data from                                                                                                                                                        |
 | `view`                        | `Union[str, List[str]]`                   | `"train"`          | Dataset split to use ("train", "val", "test")                                                                                                                                        |
-| `augmentation_engine`         | `str`                                     | `"albumentations"` | Augmentation engine to use                                                                                                                                                           |
+| `augmentation_engine`         | `str`                                     | `"albumentations"` | [Augmentation engine](../augmentations/README.md) to use.                                                                                                                            |
 | `augmentation_config`         | `Optional[Union[List[Params], PathType]]` | `None`             | Configuration for the augmentations                                                                                                                                                  |
 | `height`                      | `Optional[int]`                           | `None`             | Height of the output images                                                                                                                                                          |
 | `width`                       | `Optional[int]`                           | `None`             | Width of the output images                                                                                                                                                           |

Original file line number	Diff line number	Diff line change
`@@ -9,6 +9,9 @@`
`9`	`9`	`)`
`10`	`10`
`11`	`11`
	`12`	`+# TODO: The engine should probably also handle normalization`
	`13`	`+# so it doesn't have to be done by injecting a normalization`
	`14`	`+# transformation to the config in LuxonisTrain.`
`12`	`15`	`class AugmentationEngine(`
`13`	`16`	`ABC,`
`14`	`17`	`metaclass=AutoRegisterMeta,`