SingleGeometryComputer在compute和recompute时生成的region不同可能引起错误 #2652

shine-xia · 2023-11-07T16:59:54Z

平台(如果交叉编译请再附上交叉编译目标平台):

Platform(Include target platform as well if cross-compiling):

Linux-x86_64, debian10， CPU后端

Github版本:

Github Version:

commit 94e1212, tag 2.7.1

问题说明

起因

GeometryShape.cpp中，SingleGeometryComputer类的onRecompute()方法和onCompute()方法，在生成outputTensor的region时逻辑上存在差异。其中，
1）onCompute()计算新的region时使用makeFullSlice()函数，是将输入tensor各维度的实际长度用于计算regions[0].size[2] 。
2）onRecompute()使用了输入tensor的elementSize()方法，此方法在计算元素个数时，会将各个维度的length先做4对齐，再参与计算。
如果某个输入tensor的某个维度的length不是4的倍数，显然2者是不一致的。

onRecompute(){
//
des->regions[0].size[2] = inputs[0]->elementSize();
//
}
onCompute{
//
outputDes->regions = {TensorUtils::makeFullSlice(input)};
//
}

调用

对于动态维度模型，每次推理的输入维度都是可变的，因此会高频调用到resize()接口，从而调用到shapeComputeAndGeometryTransform()函数。此函数在首次调用和后续调用时会分别调用到onCompute()方法和onRecompute()方法。

            bool res = false;
            if (!tempBuffer.hasWrap) {
                res = geo->onRecompute(info.op, info.inputs, info.outputs, geoContext, tempBuffer);
            }
            if (!res) {
                tempBuffer.command.clear();
                tempBuffer.extras.clear();
                res = geo->onCompute(info.op, info.inputs, info.outputs, geoContext, tempBuffer);
            }

引起的问题

我手上有个模型，算子连结关系是 convolution => squeeze => unaryOp
其中，squeeze的输入tensor的shape是[1, 22, 1, xxxx]，由于22不是4的倍数，所以输入tensor的elementSize()得到的结果比实际元素个数更多。
在转换到raster算子以后，每次resize()时，都会重新计算mFastBlit里的regions。
在OpCommonUtils.cpp， line 151，OpCommonUtils::turnToPackRegion()函数中，上述使用onRecompute()方法生成的region会造成计算得到的c4Region结果错误，从而在raster算子执行数据拷贝时出错。
问题的表面现象时，每次创建完module后执行第一次推理结果正确，后续推理结果不正确。
插个题外话：mnn2.3.0之前的版本不会暴露这个问题，因为当时会调用Session::_clearCache()函数把所有tensor的老region全部清理掉，因此不会进调用onRecompute()的分支。

待讨论点

我的临时解决方案：onRecompute()方法内，也使用makeFullSlice的方式进行实际元素个数的计算，不使用4对齐。
但是感觉会引入其他问题，所以还是希望mnn团队能用更合理的方式修复这个问题

The text was updated successfully, but these errors were encountered:

jxt1234 · 2023-11-10T03:42:18Z

已经修正：
#2659

jxt1234 closed this as completed Nov 10, 2023

jxt1234 added the bug Something isn't working label Nov 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SingleGeometryComputer在compute和recompute时生成的region不同可能引起错误 #2652

SingleGeometryComputer在compute和recompute时生成的region不同可能引起错误 #2652

shine-xia commented Nov 7, 2023 •

edited

Loading

jxt1234 commented Nov 10, 2023

SingleGeometryComputer在compute和recompute时生成的region不同可能引起错误 #2652

SingleGeometryComputer在compute和recompute时生成的region不同可能引起错误 #2652

Comments

shine-xia commented Nov 7, 2023 • edited Loading

平台(如果交叉编译请再附上交叉编译目标平台):

Platform(Include target platform as well if cross-compiling):

Github版本:

Github Version:

问题说明

起因

调用

引起的问题

待讨论点

jxt1234 commented Nov 10, 2023

shine-xia commented Nov 7, 2023 •

edited

Loading