Skip to content

Commit a8ddd22

Browse files
committed
more tuning on bisenetv2
1 parent 81a8885 commit a8ddd22

File tree

4 files changed

+23
-5
lines changed

4 files changed

+23
-5
lines changed

README.md

Lines changed: 18 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,25 @@ BiSeNetV2 is faster and requires less memory, you can try BiSeNetV2 on cityscape
55
$ export CUDA_VISIBLE_DEVICES=0,1
66
$ python -m torch.distributed.launch --nproc_per_node=2 bisenetv2/train.py --fp16
77
```
8-
This would train the model and then compute the mIOU on eval set.
8+
This would train the model and then compute the mIOU on eval set.
9+
10+
~~I barely achieve mIOU of around 71. Though I can boost the performace by adding more regularizations and pretraining, as this would be beyond the scope of the paper, let's wait for the official implementation and see how they achieved that mIOU of 73.~~
11+
12+
Here is the tips how I achieved 74.39 mIOU:
13+
1. larger training scale range: In the paper, they say the images are first resized to range (0.75, 2), then 1024x2048 patches are cropped and resized to 512x1024, which equals to first resized to (0.375, 1) then crop with 512x1024 patches. In my implementation, I first rescale the image by range of (0.25, 2), and then directly crop 512x1024 patches to train.
14+
15+
2. original inference scale: In the paper, they first rescale the image into 512x1024 to run inference, then rescale back to original size of 1024x2048. In my implementation, I directly use original size of 1024x2048 to inference.
16+
17+
3. colorjitter as augmentations.
18+
19+
Note that, like bisenetv1, bisenetv2 also has a relatively big variance. Here is the mIOU after training 5 times on my platform:
20+
21+
| #No. | 1 | 2 | 3 | 4 | 5 |
22+
|:---|:---|:---|:---|:---|:---|
23+
| mIOU | 74.28 | 72.96 | 73.73 | 74.39 | 73.77 |
24+
25+
You can download the pretrained model with mIOU of 74.39 following this [link](https://drive.google.com/file/d/1r_F-KZg-3s2pPcHRIuHZhZ0DQ0wocudk/view?usp=sharing).
926

10-
I barely achieve mIOU of around 71. Though I can boost the performace by adding more regularizations and pretraining, as this would be beyond the scope of the paper, let's wait for the official implementation and see how they achieved that mIOU of 73.
1127

1228

1329
# BiSeNet

bisenetv2/cityscapes_cv2.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -127,7 +127,8 @@ class TransformationTrain(object):
127127

128128
def __init__(self):
129129
self.trans_func = T.Compose([
130-
T.RandomResizedCrop([0.375, 1.], [512, 1024]),
130+
# T.RandomResizedCrop([0.375, 1.], [512, 1024]),
131+
T.RandomResizedCrop([0.25, 2], [512, 1024]),
131132
T.RandomHorizontalFlip(),
132133
T.ColorJitter(
133134
brightness=0.4,
@@ -145,7 +146,7 @@ class TransformationVal(object):
145146

146147
def __call__(self, im_lb):
147148
im, lb = im_lb['im'], im_lb['lb']
148-
im = cv2.resize(im, (1024, 512))
149+
# im = cv2.resize(im, (1024, 512))
149150
return dict(im=im, lb=lb)
150151

151152

bisenetv2/evaluatev2.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -91,7 +91,7 @@ def evaluate(weight_pth):
9191
)
9292

9393
## evaluator
94-
eval_model(net, 4)
94+
eval_model(net, 2)
9595

9696

9797
def parse_args():

train.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,7 @@ def train():
6161
n_img_per_gpu = 8
6262
n_workers = 4
6363
cropsize = [1024, 1024]
64+
# cropsize = [1024, 512]
6465
ds = CityScapes('./data', cropsize=cropsize, mode='train')
6566
sampler = torch.utils.data.distributed.DistributedSampler(ds)
6667
dl = DataLoader(ds,

0 commit comments

Comments
 (0)