Motivation
Reduce the host-bound overhead of MindSpore opeators, optimize Dispatch machanism.
Dispatch
- Tensor.device: use mindspore native api instead of python added (Only MindSpore >= 2.7.0 supported)
- Dispatcher: use a triky method, only refer the first Tensor input, do not compare with each other with python(will speedup 2x when small models)
- Multi-backend: still support aclnn and aclop(old primitive ops)
- API patch: use a switch to open or close.
New environments
- ENABLE_DISPATCH: whether use python dispatcher
- ENABLE_PYBOOST: whether use pyboost ops(used for OrangePi and GE graph)
- ENABLE_API_PATCH: speedup models