You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+7-5Lines changed: 7 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -30,7 +30,7 @@ In this tool, a **describer** is a backend for a family of vision language model
30
30
31
31

32
32
33
-
[Example of a report on test data with various vision language models](https://github.com/kingsdigitallab/kdl-vqa/blob/main/doc/bvqa-tests-2025-03-07.pdf)
33
+
[Example of a report on test data with various vision language models](https://github.com/kingsdigitallab/kdl-vqa/blob/main/doc/bvqa-tests-2025-03-11.pdf)
34
34
35
35
## Requirements
36
36
@@ -107,7 +107,8 @@ A describer is a backend for bvqa that provide support for a family of vision la
@@ -140,7 +141,7 @@ For those describers, the models refer to model names on the Hugging Face hub. I
140
141
141
142
**Qwen** models can crash as they eat up extraordinary amount of VRAM. To keep it under control use the `-o` flag with your `describe` action. It will use flash_attention to drastically reduce memory use. However the flash attention libraries need more recent generations of GPUs. The use -o flag is documented in the model column of the above table.
142
143
143
-
**ovis**despite being small, fast and using very little VRAM, this model requires more recent GPUs due to the reliance on flash_attn package which we found often difficult to install or run on various machines.
144
+
**ovis**also greatly benefits from `-o` (flash attention), reducing the VRAM use by 3x.
144
145
145
146
## Reviewing (`report`)
146
147
@@ -205,7 +206,8 @@ You can combine this with the -f option to test on a few images only.
205
206
206
207
The -r option tells the tool to ignore the cache.
207
208
When supplied, it will always ask the questions again.
208
-
This is useful in the case where you want to compare the performance between different computing devices (e.g. Nvidia A100 vs L40s GPUs) to estimate the total duration on your entire collection.
209
+
This is useful in the case where you want to compare the performance between different computing devices
210
+
(e.g. Nvidia A100 vs L40s GPUs) to estimate the total duration on your entire collection.
209
211
210
212
## Parallelism
211
213
@@ -262,7 +264,7 @@ After running your questions on a larger proportion of your collection, you migh
262
264
As prompt engineering is usually very model-specific, moving to another model can be very disruptive.
263
265
It aways mean reassessing the answers and often means reformulating many questions from scratch.
0 commit comments