Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MNN:Sync] Sync Internal 2.7.1 #2595

Merged
merged 1 commit into from
Sep 21, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/contribute/backend.md
Original file line number Diff line number Diff line change
Expand Up @@ -177,7 +177,7 @@ virtual void onResizeBegin();
/**
* @brief callback after resize ops.
*/
virtual void onResizeEnd();
virtual ErrorCode onResizeEnd();
/**
* @brief callback before executing ops.
*/
Expand Down
129 changes: 101 additions & 28 deletions docs/inference/module.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,34 +10,52 @@
- `VARP` 作为`Module`的输入输出,也是[Expr API](expr.md)中的基础数据结构

## 工作流程
创建Executor(可选) -> 创建Module -> 创建输入VARP -> 使用Module::forwad推理 -> 使用输出VARP -> 销毁Module -> 销毁Executor(可选)
### 创建Executor
配置Executor(可选) -> 创建 RuntimeManager(可选) -> 创建Module -> 创建输入VARP -> 使用Module::forwad推理 -> 使用输出VARP -> 销毁Module
### (可选)配置Executor
`Executor`给用户提供接口来配置推理后端、线程数等属性,以及做性能统计、算子执行的回调函数、内存回收等功能。 提供一个全局的Exector对象,用户不用创建或持有对象即可直接使用。
```cpp
// 新建Exector
NNForwardType type = MNN_FORWARD_CPU;
MNN::BackendConfig backend_config; // default backend config
std::shared_ptr<MNN::Express::Executor> executor(
MNN::Express::Executor::newExecutor(type, backend_config, 4));
MNN::Express::ExecutorScope scope(executor);
// 使用默认全局Exector
// 配置默认全局Exector
MNN::BackendConfig backend_config; // default backend config
MNN::Express::Executor::getGlobalExecutor()->setGlobalExecutorConfig(type, backend_config, 4);
// 设置使用4线程+CPU
MNN::Express::Executor::getGlobalExecutor()->setGlobalExecutorConfig(MNN_FORWARD_CPU, backend_config, 4);
```

### (可选)创建 RuntimeManager
Executor 的配置会同时影响Module和表达式计算的后端配置。

*** 如下示例会触发表达式计算,若 Executor 设置为 OPENCL ,则该计算会用OpenCL后端实现
```cpp
MNN::Express::VARP X;
MNN::Express::VARP Y = MNN::Express::_Sign(X);
float* yPtr = Y->readMap<float>();
```

若希望仅在该Module中采用某种后端配置(比如Module使用GPU但表达式计算使用CPU),可额外创建 RuntimeManager ,并在创建 Module 时传入
```cpp
MNN::ScheduleConfig sConfig;
sConfig.type = MNN_FORWARD_OPENCL;

std::shared_ptr<MNN::Express::Executor::RuntimeManager> rtmgr(MNN::Express::Executor::RuntimeManager::createRuntimeManager(sConfig), MNN::Express::Executor::RuntimeManager::destroy);
rtmgr->setCache(".cachefile");
```

### 创建Module
`Module`可以通过制定模型,输入输出的名称,配置文件创建;也可以从现有的`Module`对象`clone`
`Module`可以通过指定模型,输入输出的名称,配置文件创建
```cpp
// 从模型文件加载并创建新Module
const std::string model_file = "/tmp/mymodule.mnn"; // model file with path

// 输入名,可以为空,为空时 MNN 自动搜索模型中的输入,多输入情况下无法保证顺序,需要通过 getInfo 接口查看
const std::vector<std::string> input_names{"input_1", "input_2", "input_3"};
// 输出名,可以为空,为空时 MNN 自动搜索模型中的输出,多输出情况下无法保证顺序,需要通过 getInfo 接口查看
const std::vector<std::string> output_names{"output_1"};

Module::Config mdconfig; // default module config
std::unique_ptr<Module> module; // module
module.reset(Module::load(input_names, output_names, model_filename.c_str(), &mdconfig));
// 从现有Module创建新Module,可用于多进程并发
std::unique_ptr<Module> module_shallow_copy;
module_shallow_copy.reset(Module::clone(module.get()));
// 若 rtMgr 为 nullptr ,Module 会使用Executor的后端配置
module.reset(Module::load(input_names, output_names, model_filename.c_str(), rtMgr, &mdconfig));
```

### 获取模型信息
调用`getInfo`函数可获取`Module`信息,可以参考代码:`tools/cpp/GetMNNInfo.cpp`,[工具](../tools/test.html#getmnninfo)
```cpp
Expand All @@ -57,41 +75,96 @@ struct Info {
};
const Info* getInfo() const;
```

### 执行推理
调用`onForward`执行推理。

**注意:当`Module`析构之后使用`onForward`返回的`VARP`将不可用**

```cpp
std::vector<Express::VARP> onForward(const std::vector<Express::VARP>& inputs);
std::vector<MNN::Express::VARP> onForward(const std::vector<MNN::Express::VARP>& inputs);
```

## 使用Module进行模型推理
使用Module进行推理时支持控制流算子,所以对于语音模型常用Module进行推理。示例代码:
示例代码:

```cpp
int dim = 224;
std::vector<VARP> inputs(3);
inputs[0] = _Input({1, dim}, NHWC, halide_type_of<int>());
inputs[1] = _Input({1, dim}, NHWC, halide_type_of<int>());
inputs[2] = _Input({1, dim}, NHWC, halide_type_of<int>());
// 对于 tensoflow 转换过来的模型用 NHWC ,由 onnx 转换过来的模型用 NCHW
inputs[0] = MNN::Express::_Input({1, dim}, NHWC, halide_type_of<int>());
inputs[1] = MNN::Express::_Input({1, dim}, NHWC, halide_type_of<int>());
inputs[2] = MNN::Express::_Input({1, dim}, NHWC, halide_type_of<int>());

// 设置输入数据
std::vector<int*> input_pointer = {inputs[0]->writeMap<int>(),
inputs[1]->writeMap<int>(),
inputs[2]->writeMap<int>()};
for (int i = 0; i < inputs[0]->getInfo->size; ++i) {
for (int i = 0; i < dim; ++i) {
input_pointer[0] = i + 1;
input_pointer[1] = i + 2;
input_pointer[2] = i + 3;
}
// 执行推理
std::vector<VARP> outputs = module->onForward(inputs);
std::vector<MNN::Express::VARP> outputs = module->onForward(inputs);
// 获取输出
auto output_ptr = outputs[0]->readMap<float>();
```

可以使用回调函数进行调试,与[runSessionWithCallBack](session.html#id19)相似。示例代码:
## 多实例推理

Module API 支持同个模型创建多个实例,分发到不同线程推理。具体步骤如下:

- 【启动】主线程创建基准Module: 配置Executor(可选) -> 创建 RuntimeManager(可选) -> 创建Module
- 【启动】创建子线程,在子线程中创建 Executor
- 【启动】子线程绑定该线程的Executor , Clone Module
- 【使用】子线程绑定该线程的executor,使用 Clone 出来的 Module进行推理:创建输入VARP -> 使用Module::forwad推理 -> 使用输出VARP
- 【结束】子线程绑定该线程的executor,销毁 Module
- 【结束】子线程销毁 Executor ,销毁子线程
- 【结束】主线程销毁 Module

### 创建基准Module
第一个实例的创建过程不需要变更

### 每个实例新建Exector
```cpp
NNForwardType type = MNN_FORWARD_CPU;
MNN::BackendConfig backend_config; // default backend config
std::shared_ptr<MNN::Express::Executor> executor(
MNN::Express::Executor::newExecutor(type, backend_config, 1));
```

** 若一个算法流程中有多个模型运行,每份实例单独建一个 Executor 即可。

### 每个实例克隆基准Module

```cpp
// 绑定这个实例的executor,这样不会与其他实例产生内存冲突
MNN::Express::ExecutorScope scope(executor);
std::unique_ptr<MNN::Express::Module> module_thread(MNN::Express::Module::clone(module.get()), MNN::Express::Module::destroy);
```

克隆出来的 Module 与基准 Module 共享不变的权重与常量数据,可以较大地降低新增实例若需的内存。


### 每个实例推理
```cpp
// 每个实例推理之前用 ExecutorScope 绑定这个实例的 executor
MNN::Express::ExecutorScope scope(executor);
std::vector<VARP> inputs;
/* 构建输入......*/
// 执行推理
std::vector<MNN::Express::VARP> outputs = module_thread->onForward(inputs);
/* 使用输出......*/
```

### 销毁
```cpp
//每个实例销毁Module之前,也需要用 ExecutorScope 绑定这个实例的 executor
MNN::Express::ExecutorScope scope(executor);
module_thread.reset();
```

## 调试

Module API 也支持使用回调函数进行调试,与[runSessionWithCallBack](session.html#id19)相似。示例代码:
```cpp
MNN::TensorCallBackWithInfo beforeCallBack = [&](const std::vector<MNN::Tensor*>& ntensors, const OperatorInfo* info) {

Expand All @@ -114,7 +187,7 @@ MNN::TensorCallBackWithInfo callBack = [&](const std::vector<MNN::Tensor*>& nten
return true;
};

// set callback function
// 设置回调函数,需要是创建该 Module 时的 executor ,非多实例情况下用全局 executor 即可:
Express::Executor::getGlobalExecutor()->setCallBack(std::move(beforeCallBack), std::move(callBack));

// forward would trigger callback
Expand All @@ -126,4 +199,4 @@ std::vector<VARP> outputs = user_module->onForward(inputs);
- `pictureRecognition_module.cpp` 使用`Module`执行图像分类,使用`ImageProcess`进行前处理,`Expr`进行后处理
- `pictureRecognition_batch.cpp` 使用`Module`执行图像分类,使用`ImageProcess`进行前处理,`Expr`进行后处理
- `multithread_imgrecog.cpp` 使用`Module`多线程并发执行图像分类,使用`ImageProcess`进行前处理,`Expr`进行后处理
- `transformerDemo.cpp` 使用`Module`执行Transformer模型推理
- `transformerDemo.cpp` 使用`Module`执行Transformer模型推理
19 changes: 15 additions & 4 deletions express/Expr.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -177,7 +177,9 @@ EXPRP Expr::create(Variable::Info&& info, const void* ptr, VARP::InputType type,
}
expr->mInside->mContentDirty = false;
if (memtype == COPY) {
::memcpy(expr->mInside->mOutputTensors[0]->buffer().host, originPtr, dstInfo.size * dstInfo.type.bytes());
size_t total_size = dstInfo.size;
total_size *= dstInfo.type.bytes();
::memcpy(expr->mInside->mOutputTensors[0]->buffer().host, originPtr, total_size);
} else {
expr->mInside->mOutputTensors[0]->buffer().host = (uint8_t*)originPtr;
if (memtype == REF) {
Expand Down Expand Up @@ -227,6 +229,9 @@ EXPRP Expr::create(const OpT* op, std::vector<VARP> inputs, int outputSize) {
case DataType_DT_FLOAT:
ptr = (void*)op->main.AsBlob()->float32s.data();
break;
case DataType_DT_BFLOAT16:
ptr = (void*)op->main.AsBlob()->uint8s.data();
break;
default:
break;
}
Expand Down Expand Up @@ -1081,9 +1086,15 @@ void Variable::save(const std::vector<VARP>& vars, NetT* dest) {
blob->dataFormat = (MNN_DATA_FORMAT)Utils::convertFormat(info.order);
blob->dims = info.dim;
if (info.type.code == halide_type_float) {
blob->dataType = DataType_DT_FLOAT;
blob->float32s.resize(info.size);
::memcpy(blob->float32s.data(), ptr, info.size * sizeof(float));
if (info.type.bits == 16) {
blob->dataType = DataType_DT_BFLOAT16;
blob->uint8s.resize(info.size * 2);
::memcpy(blob->uint8s.data(), ptr, info.size * sizeof(int16_t));
} else {
blob->dataType = DataType_DT_FLOAT;
blob->float32s.resize(info.size);
::memcpy(blob->float32s.data(), ptr, info.size * sizeof(float));
}
} else if (info.type.code == halide_type_int && info.type.bits == 32) {
blob->dataType = DataType_DT_INT32;
blob->int32s.resize(info.size);
Expand Down
1 change: 1 addition & 0 deletions express/Utils.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,7 @@ halide_type_t Utils::revertDataType(DataType dataType) {
CONVERT(DataType_DT_UINT8, halide_type_of<uint8_t>(), dataType);
CONVERT(DataType_DT_INT8, halide_type_of<int8_t>(), dataType);
CONVERT(DataType_DT_HALF, halide_type_of<float>(), dataType);
CONVERT(DataType_DT_BFLOAT16, halide_type_t(halide_type_float, 16), dataType);
return halide_type_of<float>();
}
Express::Dimensionformat Utils::revertFormat(int format) {
Expand Down
2 changes: 1 addition & 1 deletion include/MNN/MNNDefine.h
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,6 @@ MNN_ERROR("Check failed: %s ==> %s\n", #success, #log); \
#define STR(x) STR_IMP(x)
#define MNN_VERSION_MAJOR 2
#define MNN_VERSION_MINOR 7
#define MNN_VERSION_PATCH 0
#define MNN_VERSION_PATCH 1
#define MNN_VERSION STR(MNN_VERSION_MAJOR) "." STR(MNN_VERSION_MINOR) "." STR(MNN_VERSION_PATCH)
#endif /* MNNDefine_h */
1 change: 1 addition & 0 deletions include/MNN/Tensor.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -227,6 +227,7 @@ class MNN_PUBLIC Tensor {
* @return bytes needed to store data
*/
int size() const;
size_t usize() const;

/**
* @brief calculate number of elements needed to store data taking reordering flag into account.
Expand Down
25 changes: 18 additions & 7 deletions package_scripts/win/build_lib_release.ps1
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,8 @@
Param(
[Parameter(Mandatory=$true)][String]$path,
[String]$backends,
[Switch]$x86
[Switch]$x86,
[Switch]$cibuild
)

$erroractionpreference = "stop"
Expand All @@ -25,14 +26,18 @@ mkdir -p $PACKAGE_LIB_PATH

#clear and create package directory
powershell ./schema/generate.ps1
Remove-Item -Path $PACKAGE_PATH/include -Recurse -ErrorAction Ignore
cp -r include $PACKAGE_PATH
cp -r tools/cv/include/cv $PACKAGE_PATH/include
pushd $PACKAGE_LIB_PATH
mkdir -p Release\Dynamic\MT, Release\Dynamic\MD, Release\Static\MD, Release\Static\MT
if ($cibuild) {
mkdir -p Release\Dynamic\MT
} else {
Remove-Item -Path $PACKAGE_PATH/include -Recurse -ErrorAction Ignore
cp -r include $PACKAGE_PATH
cp -r tools/cv/include/cv $PACKAGE_PATH/include
mkdir -p Release\Dynamic\MT, Release\Dynamic\MD, Release\Static\MD, Release\Static\MT
}
popd

$CMAKE_ARGS = "-DMNN_SEP_BUILD=OFF -DMNN_BUILD_TRAIN=ON -DMNN_BUILD_OPENCV=ON -DMNN_IMGCODECS=ON -DMNN_OPENCL=ON -DMNN_VULKAN=ON -DMNN_AVX512=ON"
$CMAKE_ARGS = "-DMNN_SEP_BUILD=OFF -DMNN_BUILD_TRAIN=ON -DMNN_BUILD_OPENCV=ON -DMNN_IMGCODECS=ON -DMNN_OPENCL=ON -DMNN_VULKAN=ON -DMNN_AVX512=ON -DMNN_LOW_MEMORY=ON"
if ($backends -ne $null) {
Foreach ($backend in $backends.Split(",")) {
if ($backend -eq "cuda") {
Expand Down Expand Up @@ -78,6 +83,12 @@ Build "cmake -G Ninja $CMAKE_ARGS -DCMAKE_BUILD_TYPE=Release -DMNN_WIN_RUNTIME_M
cp MNN.lib, MNN.dll, MNN.pdb $PACKAGE_LIB_PATH\Release\Dynamic\MT
rm MNN.*

# cibuild just build single type for build test
if ($cibuild) {
popd
return
}

##### Release/Dynamic/MD ####
log "Release/Dynamic/MD"
Remove-Item CMakeCache.txt -ErrorAction Ignore
Expand All @@ -97,4 +108,4 @@ Remove-Item CMakeCache.txt -ErrorAction Ignore
Build "cmake -G Ninja $CMAKE_ARGS -DCMAKE_BUILD_TYPE=Release -DMNN_WIN_RUNTIME_MT=OFF -DMNN_BUILD_SHARED_LIBS=OFF .."
cp MNN.lib $PACKAGE_LIB_PATH\Release\Static\MD

popd
popd
2 changes: 1 addition & 1 deletion pymnn/pip_package/build_deps.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ def build_deps():
shutil.rmtree(cmake_build_dir)
os.makedirs(cmake_build_dir)
os.chdir(cmake_build_dir)
extra_opts = '-DMNN_LOW_MEMORY=OFF'
extra_opts = '-DMNN_LOW_MEMORY=ON'
extra_opts += ' -DMNN_VULKAN=ON -DMNN_VULKAN_IMAGE=OFF'
extra_opts += ' -DMNN_OPENCL=ON'
if IS_WINDOWS:
Expand Down
2 changes: 1 addition & 1 deletion pymnn/src/MNN.cc
Original file line number Diff line number Diff line change
Expand Up @@ -2157,7 +2157,7 @@ static PyObject* PyMNNCVMatrix_repr(PyObject *self) {
((PyMNNCVMatrix *)self)->matrix->get9(mat);
char buffer [100];
sprintf(buffer, "[[%f\t%f\t%f]\n [%f\t%f\t%f]\n [%f\t%f\t%f]]",
mat[0], mat[1], mat[2], mat[3], mat[4], mat[5], mat[5], mat[6], mat[7], mat[8]);
mat[0], mat[1], mat[2], mat[3], mat[4], mat[5], mat[6], mat[7], mat[8]);
return toPyObj(buffer);
}
// type: 0 set; 1 pre; 2 post
Expand Down
4 changes: 2 additions & 2 deletions source/backend/arm82/Arm82Backend.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -80,11 +80,11 @@ Execution* Arm82Backend::onCreate(const std::vector<Tensor*>& inputs, const std:
return exe;
}

static int _getAliginSize(const halide_buffer_t& buffer, MNN_DATA_FORMAT format) {
static size_t _getAliginSize(const halide_buffer_t& buffer, MNN_DATA_FORMAT format) {
// The default data type of input tensor for arm82 backend is FLOAT32.
// However, Arm82Backend default data type is FLOAT16, so check whether data type is FLOAT32,
// then divide size by 2
int size = sizeof(int16_t);
size_t size = sizeof(int16_t);
const int dimensions = buffer.dimensions;
for (int i = 0; i < dimensions; i++) {
int currentDimSize = buffer.dim[i].extent;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ LoopH:
mov w17, #0x0f
dup v3.16b, w17
and v2.16b, v0.16b, v3.16b
mov w17, #7
mov w17, #8
dup v0.16b, w17
sub v1.16b, v1.16b, v0.16b
sub v2.16b, v2.16b, v0.16b
Expand Down Expand Up @@ -145,7 +145,7 @@ LoopH:
mov w17, #0x0f
dup v3.16b, w17
and v2.16b, v0.16b, v3.16b
mov w17, #7
mov w17, #8
dup v0.16b, w17
sub v1.16b, v1.16b, v0.16b
sub v2.16b, v2.16b, v0.16b
Expand Down Expand Up @@ -347,7 +347,7 @@ LoopHRemain:
ld1 {v21.8h}, [x20], #16 // bias
mov w17, #0x0f
dup v22.16b, w17
mov w17, #7
mov w17, #8
dup v23.16b, w17
// ld1 {v3.8h}, [x2]
ld1 {v3.8h}, [x2], #16
Expand Down
Loading
Loading