Skip to content

Commit

Permalink
[MNN:Sync] Sync Internal 2.7.1
Browse files Browse the repository at this point in the history
  • Loading branch information
wangzhaode committed Sep 20, 2023
1 parent 32f72f4 commit 29b7fe8
Show file tree
Hide file tree
Showing 128 changed files with 2,362 additions and 1,134 deletions.
2 changes: 1 addition & 1 deletion docs/contribute/backend.md
Original file line number Diff line number Diff line change
Expand Up @@ -177,7 +177,7 @@ virtual void onResizeBegin();
/**
* @brief callback after resize ops.
*/
virtual void onResizeEnd();
virtual ErrorCode onResizeEnd();
/**
* @brief callback before executing ops.
*/
Expand Down
129 changes: 101 additions & 28 deletions docs/inference/module.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,34 +10,52 @@
- `VARP` 作为`Module`的输入输出,也是[Expr API](expr.md)中的基础数据结构

## 工作流程
创建Executor(可选) -> 创建Module -> 创建输入VARP -> 使用Module::forwad推理 -> 使用输出VARP -> 销毁Module -> 销毁Executor(可选)
### 创建Executor
配置Executor(可选) -> 创建 RuntimeManager(可选) -> 创建Module -> 创建输入VARP -> 使用Module::forwad推理 -> 使用输出VARP -> 销毁Module
### (可选)配置Executor
`Executor`给用户提供接口来配置推理后端、线程数等属性,以及做性能统计、算子执行的回调函数、内存回收等功能。 提供一个全局的Exector对象,用户不用创建或持有对象即可直接使用。
```cpp
// 新建Exector
NNForwardType type = MNN_FORWARD_CPU;
MNN::BackendConfig backend_config; // default backend config
std::shared_ptr<MNN::Express::Executor> executor(
MNN::Express::Executor::newExecutor(type, backend_config, 4));
MNN::Express::ExecutorScope scope(executor);
// 使用默认全局Exector
// 配置默认全局Exector
MNN::BackendConfig backend_config; // default backend config
MNN::Express::Executor::getGlobalExecutor()->setGlobalExecutorConfig(type, backend_config, 4);
// 设置使用4线程+CPU
MNN::Express::Executor::getGlobalExecutor()->setGlobalExecutorConfig(MNN_FORWARD_CPU, backend_config, 4);
```
### (可选)创建 RuntimeManager
Executor 的配置会同时影响Module和表达式计算的后端配置。
*** 如下示例会触发表达式计算,若 Executor 设置为 OPENCL ,则该计算会用OpenCL后端实现
```cpp
MNN::Express::VARP X;
MNN::Express::VARP Y = MNN::Express::_Sign(X);
float* yPtr = Y->readMap<float>();
```

若希望仅在该Module中采用某种后端配置(比如Module使用GPU但表达式计算使用CPU),可额外创建 RuntimeManager ,并在创建 Module 时传入
```cpp
MNN::ScheduleConfig sConfig;
sConfig.type = MNN_FORWARD_OPENCL;

std::shared_ptr<MNN::Express::Executor::RuntimeManager> rtmgr(MNN::Express::Executor::RuntimeManager::createRuntimeManager(sConfig), MNN::Express::Executor::RuntimeManager::destroy);
rtmgr->setCache(".cachefile");
```
### 创建Module
`Module`可以通过制定模型,输入输出的名称,配置文件创建;也可以从现有的`Module`对象`clone`
`Module`可以通过指定模型,输入输出的名称,配置文件创建
```cpp
// 从模型文件加载并创建新Module
const std::string model_file = "/tmp/mymodule.mnn"; // model file with path
// 输入名,可以为空,为空时 MNN 自动搜索模型中的输入,多输入情况下无法保证顺序,需要通过 getInfo 接口查看
const std::vector<std::string> input_names{"input_1", "input_2", "input_3"};
// 输出名,可以为空,为空时 MNN 自动搜索模型中的输出,多输出情况下无法保证顺序,需要通过 getInfo 接口查看
const std::vector<std::string> output_names{"output_1"};
Module::Config mdconfig; // default module config
std::unique_ptr<Module> module; // module
module.reset(Module::load(input_names, output_names, model_filename.c_str(), &mdconfig));
// 从现有Module创建新Module,可用于多进程并发
std::unique_ptr<Module> module_shallow_copy;
module_shallow_copy.reset(Module::clone(module.get()));
// 若 rtMgr 为 nullptr ,Module 会使用Executor的后端配置
module.reset(Module::load(input_names, output_names, model_filename.c_str(), rtMgr, &mdconfig));
```

### 获取模型信息
调用`getInfo`函数可获取`Module`信息,可以参考代码:`tools/cpp/GetMNNInfo.cpp`[工具](../tools/test.html#getmnninfo)
```cpp
Expand All @@ -57,41 +75,96 @@ struct Info {
};
const Info* getInfo() const;
```
### 执行推理
调用`onForward`执行推理。
**注意:当`Module`析构之后使用`onForward`返回的`VARP`将不可用**
```cpp
std::vector<Express::VARP> onForward(const std::vector<Express::VARP>& inputs);
std::vector<MNN::Express::VARP> onForward(const std::vector<MNN::Express::VARP>& inputs);
```

## 使用Module进行模型推理
使用Module进行推理时支持控制流算子,所以对于语音模型常用Module进行推理。示例代码:
示例代码:

```cpp
int dim = 224
std::vector<VARP> inputs(3);
inputs[0] = _Input({1, dim}, NHWC, halide_type_of<int>());
inputs[1] = _Input({1, dim}, NHWC, halide_type_of<int>());
inputs[2] = _Input({1, dim}, NHWC, halide_type_of<int>());
// 对于 tensoflow 转换过来的模型用 NHWC ,由 onnx 转换过来的模型用 NCHW
inputs[0] = MNN::Express::_Input({1, dim}, NHWC, halide_type_of<int>());
inputs[1] = MNN::Express::_Input({1, dim}, NHWC, halide_type_of<int>());
inputs[2] = MNN::Express::_Input({1, dim}, NHWC, halide_type_of<int>());

// 设置输入数据
std::vector<int*> input_pointer = {inputs[0]->writeMap<int>(),
inputs[1]->writeMap<int>(),
inputs[2]->writeMap<int>()};
for (int i = 0; i < inputs[0]->getInfo->size; ++i) {
for (int i = 0; i < dim; ++i) {
input_pointer[0] = i + 1;
input_pointer[1] = i + 2;
input_pointer[2] = i + 3;
}
// 执行推理
std::vector<VARP> outputs = module->onForward(inputs);
std::vector<MNN::Express::VARP> outputs = module->onForward(inputs);
// 获取输出
auto output_ptr = outputs[0]->readMap<float>();
```
可以使用回调函数进行调试,与[runSessionWithCallBack](session.html#id19)相似。示例代码:
## 多实例推理
Module API 支持同个模型创建多个实例,分发到不同线程推理。具体步骤如下:
- 【启动】主线程创建基准Module: 配置Executor(可选) -> 创建 RuntimeManager(可选) -> 创建Module
- 【启动】创建子线程,在子线程中创建 Executor
- 【启动】子线程绑定该线程的Executor , Clone Module
- 【使用】子线程绑定该线程的executor,使用 Clone 出来的 Module进行推理:创建输入VARP -> 使用Module::forwad推理 -> 使用输出VARP
- 【结束】子线程绑定该线程的executor,销毁 Module
- 【结束】子线程销毁 Executor ,销毁子线程
- 【结束】主线程销毁 Module
### 创建基准Module
第一个实例的创建过程不需要变更
### 每个实例新建Exector
```cpp
NNForwardType type = MNN_FORWARD_CPU;
MNN::BackendConfig backend_config; // default backend config
std::shared_ptr<MNN::Express::Executor> executor(
MNN::Express::Executor::newExecutor(type, backend_config, 1));
```

** 若一个算法流程中有多个模型运行,每份实例单独建一个 Executor 即可。

### 每个实例克隆基准Module

```cpp
// 绑定这个实例的executor,这样不会与其他实例产生内存冲突
MNN::Express::ExecutorScope scope(executor);
std::unique_ptr<MNN::Express::Module> module_thread(MNN::Express::Module::clone(module.get()), MNN::Express::Module::destroy);
```
克隆出来的 Module 与基准 Module 共享不变的权重与常量数据,可以较大地降低新增实例若需的内存。
### 每个实例推理
```cpp
// 每个实例推理之前用 ExecutorScope 绑定这个实例的 executor
MNN::Express::ExecutorScope scope(executor);
std::vector<VARP> inputs;
/* 构建输入......*/
// 执行推理
std::vector<MNN::Express::VARP> outputs = module_thread->onForward(inputs);
/* 使用输出......*/
```

### 销毁
```cpp
//每个实例销毁Module之前,也需要用 ExecutorScope 绑定这个实例的 executor
MNN::Express::ExecutorScope scope(executor);
module_thread.reset();
```
## 调试
Module API 也支持使用回调函数进行调试,与[runSessionWithCallBack](session.html#id19)相似。示例代码:
```cpp
MNN::TensorCallBackWithInfo beforeCallBack = [&](const std::vector<MNN::Tensor*>& ntensors, const OperatorInfo* info) {
Expand All @@ -114,7 +187,7 @@ MNN::TensorCallBackWithInfo callBack = [&](const std::vector<MNN::Tensor*>& nten
return true;
};
// set callback function
// 设置回调函数,需要是创建该 Module 时的 executor ,非多实例情况下用全局 executor 即可:
Express::Executor::getGlobalExecutor()->setCallBack(std::move(beforeCallBack), std::move(callBack));
// forward would trigger callback
Expand All @@ -126,4 +199,4 @@ std::vector<VARP> outputs = user_module->onForward(inputs);
- `pictureRecognition_module.cpp` 使用`Module`执行图像分类,使用`ImageProcess`进行前处理,`Expr`进行后处理
- `pictureRecognition_batch.cpp` 使用`Module`执行图像分类,使用`ImageProcess`进行前处理,`Expr`进行后处理
- `multithread_imgrecog.cpp` 使用`Module`多线程并发执行图像分类,使用`ImageProcess`进行前处理,`Expr`进行后处理
- `transformerDemo.cpp` 使用`Module`执行Transformer模型推理
- `transformerDemo.cpp` 使用`Module`执行Transformer模型推理
19 changes: 15 additions & 4 deletions express/Expr.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -177,7 +177,9 @@ EXPRP Expr::create(Variable::Info&& info, const void* ptr, VARP::InputType type,
}
expr->mInside->mContentDirty = false;
if (memtype == COPY) {
::memcpy(expr->mInside->mOutputTensors[0]->buffer().host, originPtr, dstInfo.size * dstInfo.type.bytes());
size_t total_size = dstInfo.size;
total_size *= dstInfo.type.bytes();
::memcpy(expr->mInside->mOutputTensors[0]->buffer().host, originPtr, total_size);
} else {
expr->mInside->mOutputTensors[0]->buffer().host = (uint8_t*)originPtr;
if (memtype == REF) {
Expand Down Expand Up @@ -227,6 +229,9 @@ EXPRP Expr::create(const OpT* op, std::vector<VARP> inputs, int outputSize) {
case DataType_DT_FLOAT:
ptr = (void*)op->main.AsBlob()->float32s.data();
break;
case DataType_DT_BFLOAT16:
ptr = (void*)op->main.AsBlob()->uint8s.data();
break;
default:
break;
}
Expand Down Expand Up @@ -1081,9 +1086,15 @@ void Variable::save(const std::vector<VARP>& vars, NetT* dest) {
blob->dataFormat = (MNN_DATA_FORMAT)Utils::convertFormat(info.order);
blob->dims = info.dim;
if (info.type.code == halide_type_float) {
blob->dataType = DataType_DT_FLOAT;
blob->float32s.resize(info.size);
::memcpy(blob->float32s.data(), ptr, info.size * sizeof(float));
if (info.type.bits == 16) {
blob->dataType = DataType_DT_BFLOAT16;
blob->uint8s.resize(info.size * 2);
::memcpy(blob->uint8s.data(), ptr, info.size * sizeof(int16_t));
} else {
blob->dataType = DataType_DT_FLOAT;
blob->float32s.resize(info.size);
::memcpy(blob->float32s.data(), ptr, info.size * sizeof(float));
}
} else if (info.type.code == halide_type_int && info.type.bits == 32) {
blob->dataType = DataType_DT_INT32;
blob->int32s.resize(info.size);
Expand Down
1 change: 1 addition & 0 deletions express/Utils.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,7 @@ halide_type_t Utils::revertDataType(DataType dataType) {
CONVERT(DataType_DT_UINT8, halide_type_of<uint8_t>(), dataType);
CONVERT(DataType_DT_INT8, halide_type_of<int8_t>(), dataType);
CONVERT(DataType_DT_HALF, halide_type_of<float>(), dataType);
CONVERT(DataType_DT_BFLOAT16, halide_type_t(halide_type_float, 16), dataType);
return halide_type_of<float>();
}
Express::Dimensionformat Utils::revertFormat(int format) {
Expand Down
2 changes: 1 addition & 1 deletion include/MNN/MNNDefine.h
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,6 @@ MNN_ERROR("Check failed: %s ==> %s\n", #success, #log); \
#define STR(x) STR_IMP(x)
#define MNN_VERSION_MAJOR 2
#define MNN_VERSION_MINOR 7
#define MNN_VERSION_PATCH 0
#define MNN_VERSION_PATCH 1
#define MNN_VERSION STR(MNN_VERSION_MAJOR) "." STR(MNN_VERSION_MINOR) "." STR(MNN_VERSION_PATCH)
#endif /* MNNDefine_h */
1 change: 1 addition & 0 deletions include/MNN/Tensor.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -227,6 +227,7 @@ class MNN_PUBLIC Tensor {
* @return bytes needed to store data
*/
int size() const;
size_t usize() const;

/**
* @brief calculate number of elements needed to store data taking reordering flag into account.
Expand Down
25 changes: 18 additions & 7 deletions package_scripts/win/build_lib_release.ps1
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,8 @@
Param(
[Parameter(Mandatory=$true)][String]$path,
[String]$backends,
[Switch]$x86
[Switch]$x86,
[Switch]$cibuild
)

$erroractionpreference = "stop"
Expand All @@ -25,14 +26,18 @@ mkdir -p $PACKAGE_LIB_PATH

#clear and create package directory
powershell ./schema/generate.ps1
Remove-Item -Path $PACKAGE_PATH/include -Recurse -ErrorAction Ignore
cp -r include $PACKAGE_PATH
cp -r tools/cv/include/cv $PACKAGE_PATH/include
pushd $PACKAGE_LIB_PATH
mkdir -p Release\Dynamic\MT, Release\Dynamic\MD, Release\Static\MD, Release\Static\MT
if ($cibuild) {
mkdir -p Release\Dynamic\MT
} else {
Remove-Item -Path $PACKAGE_PATH/include -Recurse -ErrorAction Ignore
cp -r include $PACKAGE_PATH
cp -r tools/cv/include/cv $PACKAGE_PATH/include
mkdir -p Release\Dynamic\MT, Release\Dynamic\MD, Release\Static\MD, Release\Static\MT
}
popd

$CMAKE_ARGS = "-DMNN_SEP_BUILD=OFF -DMNN_BUILD_TRAIN=ON -DMNN_BUILD_OPENCV=ON -DMNN_IMGCODECS=ON -DMNN_OPENCL=ON -DMNN_VULKAN=ON -DMNN_AVX512=ON"
$CMAKE_ARGS = "-DMNN_SEP_BUILD=OFF -DMNN_BUILD_TRAIN=ON -DMNN_BUILD_OPENCV=ON -DMNN_IMGCODECS=ON -DMNN_OPENCL=ON -DMNN_VULKAN=ON -DMNN_AVX512=ON -DMNN_LOW_MEMORY=ON"
if ($backends -ne $null) {
Foreach ($backend in $backends.Split(",")) {
if ($backend -eq "cuda") {
Expand Down Expand Up @@ -78,6 +83,12 @@ Build "cmake -G Ninja $CMAKE_ARGS -DCMAKE_BUILD_TYPE=Release -DMNN_WIN_RUNTIME_M
cp MNN.lib, MNN.dll, MNN.pdb $PACKAGE_LIB_PATH\Release\Dynamic\MT
rm MNN.*

# cibuild just build single type for build test
if ($cibuild) {
popd
return
}

##### Release/Dynamic/MD ####
log "Release/Dynamic/MD"
Remove-Item CMakeCache.txt -ErrorAction Ignore
Expand All @@ -97,4 +108,4 @@ Remove-Item CMakeCache.txt -ErrorAction Ignore
Build "cmake -G Ninja $CMAKE_ARGS -DCMAKE_BUILD_TYPE=Release -DMNN_WIN_RUNTIME_MT=OFF -DMNN_BUILD_SHARED_LIBS=OFF .."
cp MNN.lib $PACKAGE_LIB_PATH\Release\Static\MD

popd
popd
2 changes: 1 addition & 1 deletion pymnn/pip_package/build_deps.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ def build_deps():
shutil.rmtree(cmake_build_dir)
os.makedirs(cmake_build_dir)
os.chdir(cmake_build_dir)
extra_opts = '-DMNN_LOW_MEMORY=OFF'
extra_opts = '-DMNN_LOW_MEMORY=ON'
extra_opts += ' -DMNN_VULKAN=ON -DMNN_VULKAN_IMAGE=OFF'
extra_opts += ' -DMNN_OPENCL=ON'
if IS_WINDOWS:
Expand Down
4 changes: 2 additions & 2 deletions source/backend/arm82/Arm82Backend.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -80,11 +80,11 @@ Execution* Arm82Backend::onCreate(const std::vector<Tensor*>& inputs, const std:
return exe;
}

static int _getAliginSize(const halide_buffer_t& buffer, MNN_DATA_FORMAT format) {
static size_t _getAliginSize(const halide_buffer_t& buffer, MNN_DATA_FORMAT format) {
// The default data type of input tensor for arm82 backend is FLOAT32.
// However, Arm82Backend default data type is FLOAT16, so check whether data type is FLOAT32,
// then divide size by 2
int size = sizeof(int16_t);
size_t size = sizeof(int16_t);
const int dimensions = buffer.dimensions;
for (int i = 0; i < dimensions; i++) {
int currentDimSize = buffer.dim[i].extent;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ LoopH:
mov w17, #0x0f
dup v3.16b, w17
and v2.16b, v0.16b, v3.16b
mov w17, #7
mov w17, #8
dup v0.16b, w17
sub v1.16b, v1.16b, v0.16b
sub v2.16b, v2.16b, v0.16b
Expand Down Expand Up @@ -145,7 +145,7 @@ LoopH:
mov w17, #0x0f
dup v3.16b, w17
and v2.16b, v0.16b, v3.16b
mov w17, #7
mov w17, #8
dup v0.16b, w17
sub v1.16b, v1.16b, v0.16b
sub v2.16b, v2.16b, v0.16b
Expand Down Expand Up @@ -347,7 +347,7 @@ LoopHRemain:
ld1 {v21.8h}, [x20], #16 // bias
mov w17, #0x0f
dup v22.16b, w17
mov w17, #7
mov w17, #8
dup v23.16b, w17
// ld1 {v3.8h}, [x2]
ld1 {v3.8h}, [x2], #16
Expand Down
Loading

0 comments on commit 29b7fe8

Please sign in to comment.