-
Notifications
You must be signed in to change notification settings - Fork 3k
[NPU] Add dynamic host pipeline to support host compile #33249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
XinWangIntel
wants to merge
33
commits into
openvinotoolkit:master
Choose a base branch
from
XinWangIntel:dynamic-pipeline-mlir-deps-v3
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
33 commits
Select commit
Hold shift + click to select a range
666c9cd
Add dynamic-pipeline
XinWangIntel 6c3cc17
Use predicted shape instead of real shape
XinWangIntel 28ded18
Fix unit test
XinWangIntel baee752
Port dynamic stride change
XinWangIntel 78f32f3
Follow dynamic stride change
XinWangIntel facce95
Update allocate_tensor and update_graph_args
XinWangIntel 0b5c2d2
Use setArgumentValueSithStrides to replace setArgumentProperty
XinWangIntel 86589da
Force strides support in metadata be true
XinWangIntel 2b7f2e7
Force open compilation
XinWangIntel 4a2e032
Update local output tensor to use predict shape
XinWangIntel edb4460
Change predict log from warn to info
XinWangIntel 16ff1aa
Fix stride and shape info
XinWangIntel 2b50f9d
Check User and Local tensor with predicted result
XinWangIntel fc7b824
Skip check if user tensor is allocated by plugin
XinWangIntel 618ad56
Code clean and use ENABLE_NPU_DEBUG_CAPS to use this feature
XinWangIntel a38ea45
Init execute params
XinWangIntel 045de4f
Remove tests
XinWangIntel ed5de07
Remove some debug log
XinWangIntel c5b7c3f
Only call predict shape if user set new tensor
XinWangIntel 8192a42
Code clean
XinWangIntel a98f8d8
Code refactor
XinWangIntel 29e2e9c
Fix predict issue
XinWangIntel 5b7629e
Fix output shape
XinWangIntel 8b0311b
Update copyright
XinWangIntel 3137891
Update copyright
XinWangIntel 3a4e5ee
Update copyright
XinWangIntel 742249a
Update copyright
XinWangIntel 0227b85
Clean log and fix copyright
XinWangIntel f7a7ea0
Remove special flag for MSVC
XinWangIntel e572db5
Fix style
XinWangIntel 69f323c
Detect new mlir runtime api
XinWangIntel 0ebca81
Fix test that pass shape smaller than min size
XinWangIntel 5ec1727
Skip check for min size
XinWangIntel File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
45 changes: 45 additions & 0 deletions
45
src/plugins/intel_npu/src/backend/include/zero_dynamic_infer_request.hpp
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,45 @@ | ||
| // Copyright (C) 2018-2026 Intel Corporation | ||
| // SPDX-License-Identifier: Apache-2.0 | ||
| // | ||
|
|
||
| #pragma once | ||
|
|
||
| #include "intel_npu/utils/zero/zero_utils.hpp" | ||
| #include "zero_dynamic_pipeline.hpp" | ||
| #include "zero_infer_request.hpp" | ||
|
|
||
| namespace intel_npu { | ||
|
|
||
| class ZeroDynamicInferRequest final : public ZeroInferRequest { | ||
| public: | ||
| explicit ZeroDynamicInferRequest(const std::shared_ptr<ZeroInitStructsHolder>& initStructs, | ||
| const std::shared_ptr<const ICompiledModel>& compiledModel, | ||
| const Config& config); | ||
|
|
||
| void set_tensor(const ov::Output<const ov::Node>& port, const ov::SoPtr<ov::ITensor>& tensor) override; | ||
| void set_tensors(const ov::Output<const ov::Node>& port, | ||
| const std::vector<ov::SoPtr<ov::ITensor>>& tensors) override; | ||
|
|
||
| void infer_async() override; | ||
|
|
||
| protected: | ||
| void construct_pipeline() override; | ||
|
|
||
| /** | ||
| * @brief Allocates a tensor on host and stores the reference inside multiple attributes. | ||
| * @param index The index which the allocated tensor shall use. | ||
| * @param isInput Determines the containers in which the newly allocated tensors will be stored. | ||
| * @param allocator If provided, the tensor uses the custom allocator instead of using the default one. | ||
| * @param batchSize If provided, the value of the shape on the 0th axis is overriden with this value. | ||
| * @return Pointer towards the allocated tensor | ||
| */ | ||
| std::shared_ptr<ZeroTensor> allocate_tensor(const size_t index, | ||
| const bool isInput, | ||
| const std::optional<std::size_t> batchSize = std::nullopt) const; | ||
|
|
||
| IODescriptor prepare_io_descriptor_with_user_info(const IODescriptor& descriptor, bool isInput); | ||
|
|
||
| bool _isTensorChanged = false; | ||
| }; | ||
|
|
||
| } // namespace intel_npu |
142 changes: 142 additions & 0 deletions
142
src/plugins/intel_npu/src/backend/include/zero_dynamic_pipeline.hpp
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,142 @@ | ||
| // Copyright (C) 2018-2026 Intel Corporation | ||
| // SPDX-License-Identifier: Apache-2.0 | ||
| // | ||
|
|
||
| #pragma once | ||
|
|
||
| #include "irgraph.hpp" | ||
| #include "zero_pipeline.hpp" | ||
|
|
||
| namespace intel_npu { | ||
|
|
||
| struct DynamicPipeline : public Pipeline { | ||
| struct PipelinedCommandLists { | ||
| mutable IRGraph::GraphArguments _binding; | ||
|
|
||
| std::vector<std::unique_ptr<CommandList>> _commandLists; | ||
| // Store command list handles to pass it to ExecutionEngine | ||
| std::vector<ze_command_list_handle_t> _commandListHandles; | ||
|
|
||
| PipelinedCommandLists(size_t numCommandLists, | ||
| const std::shared_ptr<ZeroInitStructsHolder>& init_structs, | ||
| const uint32_t& group_ordinal) { | ||
| _commandLists.reserve(numCommandLists); | ||
| for (size_t i = 0; i < numCommandLists; i++) { | ||
| _commandLists.emplace_back(std::make_unique<CommandList>(init_structs, group_ordinal)); | ||
| } | ||
|
|
||
| for (size_t i = 0; i < numCommandLists; i++) { | ||
| _commandListHandles.push_back(_commandLists[i]->handle()); | ||
| } | ||
| } | ||
|
|
||
| size_t size() const { | ||
| return _commandListHandles.size(); | ||
| } | ||
|
|
||
| ze_command_list_handle_t* data() { | ||
| return _commandListHandles.data(); | ||
| } | ||
|
|
||
| void bind(IRGraph* graph); | ||
|
|
||
| std::vector<ze_command_list_handle_t>& getHandles() { | ||
| return _commandListHandles; | ||
| } | ||
|
|
||
| IRGraph::GraphArguments& getBinding() { | ||
| return _binding; | ||
| } | ||
|
|
||
| void appendBarrier() const { | ||
| // TODO | ||
| } | ||
|
|
||
| void appendNpuTimestamp(uint64_t* timestamp_buff) const { | ||
| // TODO | ||
| } | ||
|
|
||
| void updateMutableCommandList(uint32_t arg_index, | ||
| const void* arg_value, | ||
| const ov::Strides& strides, | ||
| const ov::Shape& shapes) { | ||
| if (arg_index < _binding._inputs.size()) { | ||
| _binding._inputs[arg_index].setArg(arg_value); | ||
| // Only store the valid shape dimensions | ||
| for (int64_t i = 0; i < _binding._inputs[arg_index].dimsCount; i++) { | ||
| _binding._inputs[arg_index].sizes[i] = shapes[i]; | ||
| } | ||
|
|
||
| if (!strides.empty()) { | ||
| for (int64_t i = 0; i < _binding._inputs[arg_index].dimsCount; i++) { | ||
| _binding._inputs[arg_index].strides[i] = strides[i]; | ||
| } | ||
| } else { | ||
| // Need stride based on element but not byte, calc from shape | ||
| _binding._inputs[arg_index].updateStride(); | ||
| } | ||
| } else { | ||
| size_t output_index = static_cast<size_t>(arg_index) - _binding._inputs.size(); | ||
| if (output_index < _binding._outputs.size()) { | ||
| _binding._outputs[output_index].setArg(arg_value); | ||
|
|
||
| // Only store the valid shape dimensions | ||
| for (int64_t i = 0; i < _binding._outputs[output_index].dimsCount; i++) { | ||
| _binding._outputs[output_index].sizes[i] = shapes[i]; | ||
| } | ||
|
|
||
| if (!strides.empty()) { | ||
| for (int64_t i = 0; i < _binding._outputs[output_index].dimsCount; i++) { | ||
| _binding._outputs[output_index].strides[i] = strides[i]; | ||
| } | ||
| } else { | ||
| // Need stride based on element but not byte, calc from shape | ||
| _binding._outputs[output_index].updateStride(); | ||
| } | ||
| } | ||
| } | ||
| } | ||
|
|
||
| void appendWaitOnEvent(const std::shared_ptr<Event>& event) { | ||
| event->AppendWaitOnEvent(**_commandLists.rbegin()); | ||
| } | ||
|
|
||
| void appendReset(const std::shared_ptr<Event>& event) { | ||
| event->AppendEventReset(**_commandLists.rbegin()); | ||
| } | ||
|
|
||
| void appendSignalEvent(std::shared_ptr<Event>& event) { | ||
| event->AppendSignalEvent(**_commandLists.rbegin()); | ||
| } | ||
| }; | ||
|
|
||
| public: | ||
| DynamicPipeline(const Config& config, | ||
| const std::shared_ptr<ZeroInitStructsHolder>& init_structs, | ||
| const std::shared_ptr<IGraph>& graph, | ||
| const std::vector<std::vector<std::shared_ptr<ZeroTensor>>>& input_tensors, | ||
| const std::vector<std::shared_ptr<ZeroTensor>>& output_tensors, | ||
| size_t batch_size = 1); | ||
|
|
||
| DynamicPipeline(const DynamicPipeline&) = delete; | ||
| DynamicPipeline& operator=(const DynamicPipeline&) = delete; | ||
| virtual ~DynamicPipeline() = default; | ||
|
|
||
| void push() override; | ||
| void pull() override; | ||
| void reset() const override; | ||
| virtual void update_graph_arguments(uint32_t index, | ||
| const std::shared_ptr<ZeroTensor>& tensor, | ||
| [[maybe_unused]] std::shared_ptr<ov::ITensor> userTensor = nullptr) override; | ||
| virtual void update_graph_arguments(uint32_t index, | ||
| const std::shared_ptr<ZeroTensor>& tensor, | ||
| size_t batch_index, | ||
| [[maybe_unused]] std::shared_ptr<ov::ITensor> userTensor = nullptr) override; | ||
|
|
||
| virtual std::vector<ov::ProfilingInfo> get_profiling_info() const override; | ||
|
|
||
| protected: | ||
| std::vector<std::unique_ptr<PipelinedCommandLists>> _command_lists; | ||
| }; | ||
|
|
||
| } // namespace intel_npu |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are these needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dynamic pipeline need to call irgraph special functions, so will depend on compiler adapter