-
Notifications
You must be signed in to change notification settings - Fork 4.3k
Open
Description
During model inference, some operators/subgraphs require multiple inferences using batches.
Here, a GRU is used to represent the batched portion
Often, there are 'across-batch-axis' reshape operations before and after the subgraph.
Input in0 0 1 in0 \
..... batch=1
XXX xxx 1 1 xxx a /
Reshape reshape_a 1 1 a batch_a --> the magic that creating batch
GRU batch_gru0 1 1 batch_a t0 \
GRU batch_gru1 1 1 t0 t1 batch=N
GRU batch_gru2 1 1 t1 batch_b /
Reshape reshape_b 1 1 batch_b b --> the magic that restoring batch=1
YYY yyy 1 1 b yyy \
..... batch=1
Softmax out0 1 1 z out0 /
ex.input("in0", in0);
// partial inference until a
ncnn::Mat a;
ex.extract("a", a);
// split a into batches
std::vector<ncnn::Mat> a_chunks = split_batch(a);
// partial subgraph batch inference
std::vector<ncnn::Mat> b_chunks;
for (int i = 0; i < chunks_count; i++)
{
ex.input("batch_a", a_chunks[i]);
ex.extract("batch_b", b_chunks[i]);
}
// merge outputs into b
ncnn::Mat b = merge_batch(b_chunks);
// partial inference from b
ex.input("b", b);
ex.extract("out0", out0);
dosubot and whybjiayukk
Metadata
Metadata
Assignees
Labels
No labels