Skip to content

Commit ea89f6b

Browse files
committed
update ViT-Adapter-L + HTC++
1 parent 9441aef commit ea89f6b

File tree

6 files changed

+561
-78
lines changed

6 files changed

+561
-78
lines changed

README.md

Lines changed: 55 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -10,10 +10,9 @@
1010
The official implementation of the paper "[Vision Transformer Adapter for Dense Predictions](https://arxiv.org/abs/2205.08534)".
1111

1212
## News
13-
14-
(2022/06/04) Segmentation is released.\
15-
(2022/06/02) Detection is released and segmentation will come soon.\
16-
(2022/05/17) ViT-Adapter-L yields 60.1 box AP and 52.1 mask AP on COCO test-dev.\
13+
(2022/06/09) ViT-Adapter-L yields 60.4 box AP and 52.5 mask AP on COCO test-dev.\
14+
(2022/06/04) Code and models are released.\
15+
(2022/05/17) ~~ViT-Adapter-L yields 60.1 box AP and 52.1 mask AP on COCO test-dev.~~ \
1716
(2022/05/12) ViT-Adapter-L reaches 85.2 mIoU on Cityscapes test set without coarse data.\
1817
(2022/05/05) ViT-Adapter-L achieves the SOTA on ADE20K val set with 60.5 mIoU!
1918

@@ -29,35 +28,73 @@ This work investigates a simple yet powerful adapter for Vision Transformer (ViT
2928

3029
## SOTA Model Zoo
3130

32-
COCO test-dev
33-
34-
| Method | Framework | Pre-train | Lr schd | box AP | mask AP | #Param |
35-
|:------------------:|:---------:|:---------:|:-------:|:------------------------------------------------------------------------------------------:|:------------------------------------------------------------------------------------------:|:------:|
36-
| ViT-Adapter-L | HTC++ | BEiT | 3x | [58.5](https://drive.google.com/file/d/11zpPSvmuAn7aP5brxzHE8naObnOfFxby/view?usp=sharing) | [50.8](https://drive.google.com/file/d/1wIbtzfHfPqkvZaSivzcsh4HWu1oSiun6/view?usp=sharing) | 401M |
37-
| ViT-Adapter-L (MS) | HTC++ | BEiT | 3x | [60.1](https://drive.google.com/file/d/1i-qjgUK4CMwZcmu5pkndldwfVbdkw5sU/view?usp=sharing) | [52.1](https://drive.google.com/file/d/16mlEOPY7K-Xpx_CL650A-LWbVDm2vl4X/view?usp=sharing) | 401M |
38-
39-
ADE20K val
31+
**COCO mini-val test-dev**
32+
33+
34+
<table>
35+
<tr align=center>
36+
<td rowspan="2" align=center><b>Method</b></td>
37+
<td rowspan="2" align=center><b>Framework</b></td>
38+
<td rowspan="2" align=center><b>Pre-train</b></td>
39+
<td rowspan="2" align=center><b>Schd</b></td>
40+
<td colspan="2" align=center><b>mini-val</b></td>
41+
<td colspan="2" align=center><b>test-dev</b></td>
42+
<td rowspan="2" align=center><b>#Param</b></td>
43+
</tr>
44+
<tr>
45+
<td>box AP</td>
46+
<td>mask AP</td>
47+
<td>box AP</td>
48+
<td>mask AP</td>
49+
</tr>
50+
<tr align=center>
51+
<td>ViT-Adapter-L</td>
52+
<td>HTC++</td>
53+
<td>BEiT</td>
54+
<td>3x</td>
55+
<td>58.4</td>
56+
<td>50.8</td>
57+
<td><a href="https://drive.google.com/file/d/1lXQxf5PJ0g0bQNkMMrhG63jal0NsmYjb/view?usp=sharing">58.9</a></td>
58+
<td><a href="https://drive.google.com/file/d/1nyuONJcHHXki0Cn8dCgbPZ9D_MURh47t/view?usp=sharing">51.3</a></td>
59+
<td>401M</td>
60+
</tr>
61+
<tr align=center>
62+
<td>ViT-Adapter-L$^\dagger$</td>
63+
<td>HTC++</td>
64+
<td>BEiT</td>
65+
<td>3x</td>
66+
<td>60.2</td>
67+
<td>52.2</td>
68+
<td><a href="https://drive.google.com/file/d/15t2Oc3FiNeLr6RnKOJ-0IbI7b2LalxbX/view?usp=sharing">60.4</a></td>
69+
<td><a href="https://drive.google.com/file/d/1TIPOJC6ieZS_ZRNCbo_AW4UqYAkQIjyN/view?usp=sharing">52.5</a></td>
70+
<td>401M</td>
71+
</tr>
72+
</table>
73+
74+
$\dagger$ demotes multi-scale testing.
75+
76+
**ADE20K val**
4077

4178
| Method | Framework | Pre-train | Iters | Crop Size | mIoU | +MS | #Param |
4279
|:-------------:|:-----------:|:---------------:|:-----:|:---------:|:------------------------------------------------------------------------------------------:|:------------------------------------------------------------------------------------------:|:------:|
4380
| ViT-Adapter-L | UperNet | BEiT | 160k | 640 | [58.0](https://drive.google.com/file/d/1KsV4QPfoRi5cj2hjCzy8VfWih8xCTrE3/view?usp=sharing) | [58.4](https://drive.google.com/file/d/1haeTUvQhKCM7hunVdK60yxULbRH7YYBK/view?usp=sharing) | 451M |
4481
| ViT-Adapter-L | Mask2Former | BEiT | 160k | 640 | [58.3](https://drive.google.com/file/d/1jj56lSbc2s4ZNc-Hi-w6o-OSS99oi-_g/view?usp=sharing) | [59.0](https://drive.google.com/file/d/1hgpZB5gsyd7LTS7Aay2CbHmlY10nafCw/view?usp=sharing) | 568M |
4582
| ViT-Adapter-L | Mask2Former | COCO-Stuff-164k | 80k | 896 | [59.4](https://drive.google.com/file/d/1B_1XSwdnLhjJeUmn1g_nxfvGJpYmYWHa/view?usp=sharing) | [60.5](https://drive.google.com/file/d/1UtjmgcYKR-2h116oQXklUYOVcTw15woM/view?usp=sharing) | 571M |
4683

47-
Cityscapes val/test
84+
**Cityscapes val/test**
4885

4986
| Method | Framework | Pre-train | Iters | Crop Size | val mIoU | val/test +MS | #Param |
5087
|:-------------:|:-----------:|:---------:|:-----:|:---------:|:------------------------------------------------------------------------------------------:|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:------:|
5188
| ViT-Adapter-L | Mask2Former | Mapillary | 80k | 896 | [84.9](https://drive.google.com/file/d/1LKy0zz-brCBbKGmUWquadILaBHdDLR6s/view?usp=sharing) | [85.8](https://drive.google.com/file/d/1LSJvK1BPSbzm9eWpKL8Xo7RmYBrd2xux/view?usp=sharing)/[85.2](https://www.cityscapes-dataset.com/anonymous-results/?id=0ca6821dc3183ff970bd5266f812df2eaa4519ecb1973ca1308d65a3b546bf27) | 571M |
5289

53-
COCO-Stuff-10K
90+
**COCO-Stuff-10K**
5491

5592
| Method | Framework | Pre-train | Iters | Crop Size | mIoU | +MS | #Param |
5693
|:-------------:|:-----------:|:---------:|:-----:|:---------:|:------------------------------------------------------------------------------------------:|:------------------------------------------------------------------------------------------:|:------:|
5794
| ViT-Adapter-L | UperNet | BEiT | 80k | 512 | [51.0](https://drive.google.com/file/d/1xZodiAvOLGaLtMGx_btYVZIMC2VKrDhI/view?usp=sharing) | [51.4](https://drive.google.com/file/d/1bmFG9GA4bRqOEJfqXcO7nWYPwG3wSk2J/view?usp=sharing) | 451M |
5895
| ViT-Adapter-L | Mask2Former | BEiT | 40k | 512 | [53.2](https://drive.google.com/file/d/1Buewc1n7GBAcBDXeia-QarujrDZqc_Sx/view?usp=sharing) | [54.2](https://drive.google.com/file/d/1kQgJUHDeQoO3pPY6QoXRKwyF7heT7wCJ/view?usp=sharing) | 568M |
5996

60-
Pascal Context
97+
**Pascal Context**
6198

6299
| Method | Framework | Pre-train | Iters | Crop Size | mIoU | +MS | #Param |
63100
|:-------------:|:-----------:|:---------:|:-----:|:---------:|:------------------------------------------------------------------------------------------:|:------------------------------------------------------------------------------------------:|:------:|
@@ -68,7 +105,7 @@ Pascal Context
68105

69106
### COCO mini-val
70107

71-
Baseline Detectors
108+
**Baseline Detectors**
72109

73110
| Method | Framework | Pre-train | Lr schd | Aug | box AP | mask AP | #Param |
74111
|:-------------:|:----------:|:---------:|:-------:|:---:|:------:|:-------:|:------:|
@@ -77,7 +114,7 @@ Baseline Detectors
77114
| ViT-Adapter-B | Mask R-CNN | DeiT | 3x | Yes | 49.6 | 43.6 | 120M |
78115
| ViT-Adapter-L | Mask R-CNN | AugReg | 3x | Yes | 50.9 | 44.8 | 348M |
79116

80-
Advanced Detectors
117+
**Advanced Detectors**
81118

82119
| Method | Framework | Pre-train | Lr schd | Aug | box AP | mask AP | #Param |
83120
|:-------------:|:-------------------:|:---------:|:-------:|:---:|:------:|:-------:|:------:|
@@ -88,7 +125,7 @@ Advanced Detectors
88125
| ViT-Adapter-B | Upgraded Mask R-CNN | MAE | 25ep | LSJ | 50.3 | 44.7 | 122M |
89126
| ViT-Adapter-B | Upgraded Mask R-CNN | MAE | 50ep | LSJ | 50.8 | 45.1 | 122M |
90127

91-
ADE20K val
128+
**ADE20K val**
92129

93130
| Method | Framework | Pre-train | Iters | Crop Size | mIoU | +MS | #Param |
94131
|:-------------:|:---------:|:---------:|:-----:|:---------:|:----:|:----:|:------:|

0 commit comments

Comments
 (0)