InternVL3_5-1B-Flash no speed-up vs. InternVL3_5-1B

Hi there,

I don't notice any speed-up from the Visual Resolution Router that should be enabled in the flash model

I set up with the suggested route as per [here](https://huggingface.co/OpenGVLab/InternVL3_5-1B-Flash/tree/main)

```
model = AutoModel.from_pretrained(model_path, **model_kwargs).eval().cuda()
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
```

But I don't see the inference speed boost in any of my testing. Are there particular contexts the inference speed shines within? Is there a configuration flag I need to switch to enable the ViR?

```
================================================================================
TEST 2: Batch Size 16
================================================================================

┌─────────────────────────────────────────┬──────────────────┬──────────────────┐
│ Metric                                  │ Flash Model      │ Standard Model   │
├─────────────────────────────────────────┼──────────────────┼──────────────────┤
│ Batch Time (ms)                         │         4500.43  │         4478.83  │
│ Time per Item (ms)                      │          900.09  │          895.77  │
│ Throughput (items/sec)                  │            1.11  │            1.12  │
│ Peak Memory (MB)                        │         2281.94  │         2109.76  │
└─────────────────────────────────────────┴──────────────────┴──────────────────┘

Speedup (Flash vs Standard): 1.00x
Throughput Improvement: -0.5%
Memory Reduction: -8.2%
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

InternVL3_5-1B-Flash no speed-up vs. InternVL3_5-1B #1231

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

InternVL3_5-1B-Flash no speed-up vs. InternVL3_5-1B #1231

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions