batch inference in partial model graph

During model inference, some operators/subgraphs require multiple inferences using batches.

Here, a GRU is used to represent the batched portion

Often, there are 'across-batch-axis' reshape operations before and after the subgraph.

```
Input     in0         0 1 in0               \
.....                                        batch=1
XXX       xxx         1 1 xxx a             /
Reshape   reshape_a   1 1 a batch_a        --> the magic that creating batch
GRU       batch_gru0  1 1 batch_a t0        \
GRU       batch_gru1  1 1 t0 t1              batch=N
GRU       batch_gru2  1 1 t1 batch_b        /
Reshape   reshape_b   1 1 batch_b b        --> the magic that restoring batch=1
YYY       yyy         1 1 b yyy             \
.....                                        batch=1
Softmax   out0        1 1 z out0            /
```

```cpp
ex.input("in0", in0);

// partial inference until a
ncnn::Mat a;
ex.extract("a", a);

// split a into batches
std::vector<ncnn::Mat> a_chunks = split_batch(a);

// partial subgraph batch inference
std::vector<ncnn::Mat> b_chunks;
for (int i = 0; i < chunks_count; i++)
{
    ex.input("batch_a", a_chunks[i]);
    ex.extract("batch_b", b_chunks[i]);
}

// merge outputs into b
ncnn::Mat b = merge_batch(b_chunks);

// partial inference from b
ex.input("b", b);

ex.extract("out0", out0);
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

batch inference in partial model graph #6349

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

batch inference in partial model graph #6349

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions