active learning loop workflow on custom dataset #1087
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I have created a new notebook which is not linked to deepforest in any ways, I just wrote my code in the same directory for convenience. This notebook tries to see whether my model is learning in the way that I want or not.
I have used a custom dataset of daisy flowers with 131 annotated images in the COCO dataset with images format. To reproduce this code we need training image dataset with its annotations either in
.json
or.csv
format and a test dataset.I have used a very light model for training, to reduce time. If i replace it with a better model then the accuracy can improve
Objective
simulating how an object-detector’s accuracy (mAP) improves as I will iteratively label more images. This example currently uses random sampling, which (I will take as baseline) from the unlabeled pool, next step is to use an active learning specific sampling tecohnique.
How to reproduce this (without shipping the giant flower dataset)
Prepare your own dataset in COCO format (or convert from VOC/Pascal/CSV into COCO).
images
,annotations
, andcategories
keys. Each annotation must haveimage_id
,bbox
in[x,y,w,h]
, andcategory_id
.Majorly 3 steps in the workflow
COCO ↔ CSV conversion
parse_coco(json_file, img_dir)
function reads a COCO-style annotation JSON and writes out a flatlabels_raw.csv
, with one row per bounding box (xmin,ymin,xmax,ymax,label,image_path
).build_coco_gt(df, out_json)
utility takes that CSV back into a minimal COCO JSON (images, annotations, categories) so that we can use it later as the “ground truth” for evaluation.Custom Dataset + DataLoader
FlowerDataset
class (subclassingtorch.utils.data.Dataset
) whose__getitem__
loads an image, retrieves its boxes and labels from my CSV/COCO data, applies resizing, converts everything to tensors, and returns(image, target_dict)
for TorchVision detection models.Active-Learning Loop
ROUNDS
cycles:train_idx
andtest_idx
.POOL_BATCH
new images from the pool to add totrain_idx