Skip to content

Commit b7726e1

Browse files
committed
src: cpu: fix large padding handling in src_transpose jit
When 'padding > 16', jit:src_transpose did not handle these cases properly. These cases may be caused when width dilation exists.
1 parent 4449071 commit b7726e1

File tree

1 file changed

+11
-2
lines changed

1 file changed

+11
-2
lines changed

src/cpu/jit_transpose_src_utils.cpp

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -431,8 +431,17 @@ void jit_trans_iw_ic_int16_t::transpose(
431431
int store_pad = div_up(r_pad, 2);
432432
int addr_shift = r_pad % 2;
433433
add(reg_tr_src_tmp, (nrows - addr_shift) * typesize);
434-
padding(reg_tr_src_tmp, store_pad);
435-
sub(reg_tr_src_tmp, (nrows - addr_shift) * typesize);
434+
// note: r_pad can be bigger than 16 because of dilation
435+
int tail = store_pad % transpose_size;
436+
int pad_rows = store_pad / transpose_size;
437+
for (int pad = 0; pad < pad_rows; pad++) {
438+
padding(reg_tr_src_tmp, transpose_size);
439+
add(reg_tr_src_tmp, 2 * transpose_size * typesize);
440+
}
441+
if (tail > 0) padding(reg_tr_src_tmp, tail);
442+
sub(reg_tr_src_tmp,
443+
(pad_rows * 2 * transpose_size + nrows - addr_shift)
444+
* typesize);
436445
}
437446

438447
int store_tail = rnd_up(nrows, 2);

0 commit comments

Comments
 (0)