You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: luxonis_ml/data/README.md
+78-26Lines changed: 78 additions & 26 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -50,7 +50,7 @@ Each of these steps will be explained in more detail in the following examples.
50
50
51
51
We will be using our toy dataset `parking_lot` in all examples. The dataset consists of images of cars and motorcycles in a parking lot. Each image has a corresponding annotation in the form of a bounding box, keypoints and several segmentation masks.
@@ -74,12 +74,15 @@ You can create as many datasets as you want, each with a unique name.
74
74
75
75
Datasets can be stored locally or in one of the supported cloud storage providers.
76
76
77
+
> \[!NOTE\]
78
+
> 📚 For a complete list of all parameters and methods of the `LuxonisDataset` class, see the [datasets README.md](datasets/README.md).
79
+
77
80
### Dataset Creation
78
81
79
82
First we import `LuxonisDataset` and create a dataset with the name `"parking_lot"`.
80
83
81
84
```python
82
-
fromluxonisml.data import LuxonisDataset
85
+
fromluxonis_ml.data import LuxonisDataset
83
86
84
87
dataset_name ="parking_lot"
85
88
@@ -104,58 +107,93 @@ Each data entry should be a dictionary with the following structure:
104
107
```python
105
108
{
106
109
"file": str, # path to the image file
107
-
"annotation": Optional[dict] # annotation of the file
110
+
"task_name": Optional[str], # task type for this annotation
111
+
"annotation": Optional[dict] # annotation of the instance in the file
108
112
}
109
113
```
110
114
111
-
The content of the `"annotation"` field depends on the task type and follows the Annotation Format described later in this document.
115
+
Luxonis Data Format supports **annotations optionally structured into different tasks** for improved organization. Tasks can be explicitly named or left unset - if none are specified, all annotations will be grouped under a single `task_name` set by default to `""` . The [example below](#adding-data-with-a-generator-function) demonstrates this with instance keypoints and segmentation tasks.
116
+
117
+
The content of the `"annotation"` field depends on the task type and follows the [Annotation Format](#annotation-format) described later in this document.
112
118
113
119
#### Adding Data with a Generator Function
114
120
115
121
The recommended approach for adding data is to create a generator function that yields data entries one by one.
116
122
117
-
Here's an example that loads object detection annotations:
123
+
The following example demonstrates how to load **bounding box annotations** along with their corresponding **keypoints annotations**, which are linked via `"instance_id"`.
124
+
125
+
Additionally, we yield **segmentation masks** while ensuring a clear separation between task groups. To achieve this, we use the `"task_name"` field—assigning `"instance_keypoints_car"` and `"instance_keypoints_motorbike"` for instance-keypoint-related annotations, and `"segmentation"` for the semantic segmentation task.
118
126
119
127
```python
120
128
import json
121
129
from pathlib import Path
130
+
import cv2
131
+
import numpy as np
122
132
123
133
# path to the dataset, replace it with the actual path on your system
124
134
dataset_root = Path("data/parking_lot")
125
135
126
136
defgenerator():
127
137
for annotation_dir in dataset_root.iterdir():
128
-
withopen(annotation_dir /"annotations.json") as f:
-**Classification Directory** - A directory with subdirectories for each class
303
345
304
346
```plaintext
@@ -348,6 +390,12 @@ The dataset directory can either be a local directory or a directory in one of t
348
390
349
391
The directory can also be a zip file containing the dataset.
350
392
393
+
The `task_name` argument can be specified as a single string or as a dictionary. If a string is provided, it will be used as the task name for all records.
394
+
Alternatively, you can provide a dictionary that maps class names to task names for better dataset organization. See the example below.
395
+
396
+
> \[!NOTE\]
397
+
> 📚 For a complete list of all parameters of the `LuxonisParser` class, see the [parsers README.md](parsers/README.md).
After initializing the parser, you can parse the dataset to create a `LuxonisDataset` instance. The `LuxonisDataset` instance will contain the data from the dataset with splits for training, validation, and testing based on the dataset directory structure.
|`dataset_dir`|`str`| Required | Path or URL to dataset directory (local path, `gcs://`, `s3://` or `roboflow://`) |
18
+
|`dataset_name`|`Optional[str]`|`None`| Name for the dataset (if None, derived from directory name) |
19
+
|`save_dir`|`Optional[Union[Path, str]]`|`None`| Where to save downloaded datasets if remote URL is provided (if None, uses current directory) |
20
+
|`dataset_plugin`|`Optional[str]`|`None`| Dataset plugin to use (if None, uses `LuxonisDataset`) |
21
+
|`dataset_type`|`Optional[DatasetType]`|`None`| Force specific dataset format type instead of auto-detection |
22
+
|`task_name`|`Optional[Union[str, Dict[str, str]]]`|`None`| Task name(s) for the dataset. Used to link the classes to the desired tasks, with class names as keys and task names as values. |
|`split`|`Optional[str]`|`None`| Split name if parsing a single split |
29
+
|`random_split`|`bool`|`True`| Whether to create random splits |
30
+
|`split_ratios`|`Optional[Dict[str, float]]`|`None`| Ratios for train/validation/test splits. If set to `None`, the default behavior of the `LuxonisDataset`'s `make_splits` method will be used |
0 commit comments