Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial Backend Engine Integration #773

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

cavusmustafa
Copy link

This PR includes the new backend engine. It separates the OpenVINO execution from the Executable into an isolated class, IEBackendEngine. This will create an abstraction for integrating other device based backend engines in the future.

@cavusmustafa cavusmustafa marked this pull request as ready for review January 5, 2021 23:40
@cavusmustafa cavusmustafa requested a review from adk9 January 5, 2021 23:40
Comment on lines +183 to +185
ie_inputs[i] = nullptr;
ie_inputs[i] = static_pointer_cast<IETensor>(inputs[i]);
input_names[i] = input_name;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make IETensor named so that we can track the name together with the tensor.

@@ -167,7 +165,9 @@ bool Executable::Call(const vector<shared_ptr<runtime::Tensor>>& inputs,
}

// Prepare input blobs
auto func = m_network.getFunction();
auto func = m_ie_engine->GetFunc();
std::vector<std::shared_ptr<IETensor>> ie_inputs(inputs.size());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessary. You can pass inputs directly to the backend.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But in that case we have to pass ngraph tensors directly and cast them to IETensors inside the engine. Then we will need to move more functionality on the backend from the executable. We might need further discussion about the related changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IETensor is an nGraph tensor.

Comment on lines +231 to +232
m_ie_engine->Infer(ie_inputs, input_names, ie_outputs, output_names,
ie_hoisted_params, param_names);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does the backend need to know hoisted parameters for? It could just be passed as an input?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible to merge them into a single vector. But I think this time we may not pass the inputs directly as suggested in the review above. Also, I'm not sure if this would be safe for batching. It may not be an issue but we may need to verify.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A hoisted parameter is an input to the model. The backend doesn't need to know whether an input was hoisted or not.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the basic backend this may not matter. VADM backend will divide input into multiple batches but it's not the case for the hoisted parameters. How do their size affected by the actual input batch size? If we pass them as regular inputs to the backend now, the VADM backend will disable batching because it sees multiple inputs although the actual input size is 1.

@@ -0,0 +1,74 @@
/*******************************************************************************
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit in terms of naming: I'd prefer to call this Backend. The "IE" prefix is unnecessary since that's the only type of backends we expect to interface with. The files can similarly be renamed to be:

backend.{h,cc}
backends/hddl.{h,cc}
backends/myriad.{h,cc}

Comment on lines +45 to +55
if (m_device == "MYRIAD") {
// Set MYRIAD configurations
if (IEUtils::VPUConfigEnabled()) {
config["MYRIAD_DETECT_NETWORK_BATCH"] = "NO";
}

if (IEUtils::VPUFastCompileEnabled()) {
config["MYRIAD_HW_INJECT_STAGES"] = "NO";
config["MYRIAD_COPY_OPTIMIZATION"] = "NO";
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These customizations should be set in the "myriad" backend implementation.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Answered within the comment below since it is related.

@@ -0,0 +1,80 @@
/*******************************************************************************
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Basic Engine" is unnecessary if you provide a default implementation for the abstract backend? Backends that don't need a custom "Infer" or custom configuration can fall back to the default impl.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the basic execution should be in Executable or in BackendEngine? We would disagree with the first one. The second option might be possible and it may help with the issue about MYRIAD configurations above too. But we may need to think about it if it will impact anything else in the future. Although, we may change this design later, I don't think it's in a bad shape the be merged now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Basic" execution should be in the implementation of the default Backend.

// Returns the NGraph Function from the CNNNetwork
std::shared_ptr<ngraph::Function> GetFunc();

virtual const std::vector<size_t> GetOutputShape(const int i) = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a function needed for VADM backend. We can remove it in this PR but we need to bring it back with the next PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The output shape can be determined either from the function or the output that was originally passed to the backend?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VADM backend modifies the batch size of the function so the output size we get from the function will be wrong. We can get the output size just after creating the function and use it later for allocations maybe. This will add extra functionality into the NGraphEncapsulateOp and I need to test to see how it works.


// Returns output batch size based on the input batch size and the device
// FIXME: This may not be needed
virtual size_t GetOutputBatchSize(size_t input_batch_size) const;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be removed.

Comment on lines +48 to +51
// Enables multi request execution if the execution engine supprts
void EnableMultiReqExecution();
// Disables multi request execution
void DisableMultiReqExecution();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be a parameter to the HDDL backend constructor.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may limit us to enable/disable batching dynamically. Or this parameter might be a part of caching. But it will require more changes into the existing bridge code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In which scenarios would we want to enable/disable batching dynamically for a given network executing on a given device? If we expect this interface to be limited to a specific backend, then it shouldn't be a part of the abstract interface.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then this should be a parameter to the Executable constructor since the backend is created there.

Comment on lines +67 to +68
virtual void StartAsyncInference(const int req_id);
virtual void CompleteAsyncInference(const int req_id);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How's one supposed to use these? Infer only uses one infer request.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are required for the asynchronous execution. Currently we need this for VADM. For the other backends, it will be same as Infer for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I can tell that these are required for asynchronous execution from the name :)

I was asking how one'd use this interface because I don't see a way to create an asynchronous inference request? The implementation for these is broken at the moment, and I'd prefer that we implement it correctly if we're extending the interface.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not have any impact on the execution. We kept all executions as async to have a common call for all backends in case we may need it for multiple backends in the future. We can move back to Infer call for the basic execution for now, it will not make any difference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants