Zero-Shot Semantic Segmentation for Robots in Agriculture (IROS 2025)
Demo (vegetation mask with SAM only)
Our approach can segment crop plants and weeds without labels. We leverage foundation models SAM and the ViT from BioCLIP to build a bag of features representing crop plants. During inference, we extract plant features and compare them with the bag of features. Plant features with low similarity with the bag of features are inferred as weeds.
Qualitative results on different datasets. The top row shows input image, the second row shows ground truth, and the third row
shows our performance.
pip install -r requirements.txt
cd src/ipb-loaders; pip install -U -e .
wget -P scripts/sam/ https://dl.fbaipublicfiles.com/segment_anything/sam_vit_l_0b3195.pth
TODO: update to the correct list after all other checks /home/linn/venvs/waw-reprod
The data set should follow the following directory structure:
${DATASET_PARENT_DIR}
├── train
│ └── images
│ └── semantics
├── val
│ ├── images
│ └── semantics
└── test
├── images
For further details, see the data set directory structure of PhenoBench.
You can use the bag of features from the paper:
OR
Build your own (e.g., for a different dataset):
-
Obtain vegetation segments for train split images with SAM
python3 scripts/get_bb_clips.py \ --input_dir <path to dataset parent dir> \ --output_dir <path of output dir> \ --split train \ --vm_dir <optional, path of output dir with vegetation masks> \ --vis_dir <optional, path of output dir of visualisations> \ --aug_cfg <optional, path to augmentation configuration file. defaults to ./cfgs/augs_clip.cfg> \ --points_per_side <optional, number of point prompts for SAM> \ --is_remove_overlap
This will give the patches saved as .png files. Additionally, if specified, the point prompts used to prompt SAM in vis_dir and the resultant vegetation masks of the input images in vm_dir.
-
Separate out the popular features using BioCLIP's ViT
First, you need to create a .yaml file; see ./cfgs/vote_phenobench.yaml for an example.python scripts/bioclip_popularity.py \ --yaml_cfg ./cfgs/vote_phenobench.yaml \ --output_dir <output dir for patches of crop plants>
(This might take some time, especially if the number of features is large)
First you need to get the vegetation masks on the test set:
python scripts/get_bb_clips.py \
--input_dir <path to dataset parent dir> \
--output_dir <path of output dir> \
--split test \
--vm_dir <path of output dir with vegetation masks> \
--vis_dir <optional, path of output dir of visualisations> \
--points_per_side <optional, number of point prompts for SAM> \
--aug_cfg ./cfgs/test_augs.cfg
Then, we can get the semantic segmentation:
python scripts/get_predictions_bioclip.py \
--yaml_cfg ./cfgs/vote_phenobench.yaml \
--crop_feats_dir <output dir for patches of crop plants> \
--vis_dir <output vis dir path>;
python scripts/get_cws.py \
--input_dir <output vis dir path> \
--output_dir <predictions dir path> \
--vm_dir <vegetation m1ask directory> \
--ds_dir <dataset directory> \
--img_size <image size> \
--center_crop \
--is_vis;
python scripts/evaluate.py \
--semantics_dir <predictions dir path> \
--gt_dir <ground truth labels dir path> \
--img_size <image size in px> \
--center_crop;
Note: For PhenoBench test split evaluation, we used the CodaLab benchmark online.
We developed/tested this code on Python 3.12 and utilising a NVIDIA RTX A6000 GPU.