-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Description
Describe the issue
Summary
The OpenVINO execution provider on Linux crashes with a segmentation fault because of an off-by-one error in capability.cc
. The bug causes iteration past the end of the connected_clusters
vector, resulting in undefined behavior when dereferencing an invalid memory location.
Environment
- Platform: Linux (built from source)
- Versions affected: 1.23.0, 1.23.1 (code unchanged between versions)
- Component:
libonnxruntime_providers_openvino.so
Root Cause
In onnxruntime/core/providers/openvino/ov_versions/capability.cc
(line ~182), this loop increments j
before accessing connected_clusters[j]
:
while (j < total_clusters && !append_node) {
j = j + 1; // Incremented BEFORE use
append_node = AddTrivialClusterToNextClusterIfConnected(
graph_viewer_, index, connected_clusters[j]); // j is now out of bounds
}
When j == total_clusters - 1
and the loop condition is true, j
becomes total_clusters
, causing connected_clusters[j]
to access one element past the vector's end.
Evidence
While attempting to use the OpenVINO execution provider on Linux, I encountered a segfault. The original occurrence and details below pertain to 1.23.0, but I showed that the proposed fix is effective with 1.23.1 and the relevant code is unchanged between these two versions. We build from source and run on both Windows and Linux.
Stack trace:
libonnxruntime_providers_openvino.so!onnxruntime::openvino_ep::AddTrivialClusterToNextClusterIfConnected(const onnxruntime::GraphViewer & graph_viewer, const onnxruntime::NodeIndex curr_node_index, const std::vector<unsigned long, std::allocator<unsigned long> > & search_cluster) (/…/EP/src/ONNXRuntime/onnxruntime/core/providers/openvino/ov_versions/utils.cc:159)
libonnxruntime_providers_openvino.so!onnxruntime::openvino_ep::GetCapability::Execute(onnxruntime::openvino_ep::GetCapability * const this) (/…/EP/src/ONNXRuntime/onnxruntime/core/providers/openvino/ov_versions/capability.cc:184)
libonnxruntime_providers_openvino.so!onnxruntime::openvino_ep::OpenVINOExecutionProvider::GetCapability(const onnxruntime::openvino_ep::OpenVINOExecutionProvider * const this, const onnxruntime::GraphViewer & graph_viewer) (/…/EP/src/ONNXRuntime/onnxruntime/core/providers/openvino/openvino_execution_provider.cc:85)
libonnxruntime.so.1!operator()(const struct {...} * const __closure, const onnxruntime::IExecutionProvider & ep, const onnxruntime::GraphViewer & graph_viewer, const onnxruntime::IExecutionProvider::IKernelLookup & kernel_lookup, onnxruntime::IResourceAccountant * resource_accountant, const onnxruntime::GraphOptimizerRegistry & graph_optimizer_registry) (/…/EP/src/ONNXRuntime/onnxruntime/core/framework/graph_partitioner.cc:159)
libonnxruntime.so.1!onnxruntime::GetCapabilityForEP(const onnxruntime::(anonymous namespace)::GetCapabilityForEPParams & params, const onnxruntime::logging::Logger & logger) (/…/EP/src/ONNXRuntime/onnxruntime/core/framework/graph_partitioner.cc:197)
libonnxruntime.so.1!onnxruntime::PartitionOnnxFormatModelImpl(onnxruntime::Graph & graph, onnxruntime::FuncManager & func_mgr, onnxruntime::KernelRegistryManager & kernel_registry_mgr, onnxruntime::KernelRegistry & fused_kernel_registry, onnxruntime::IExecutionProvider & current_ep, onnxruntime::GraphPartitioner::Mode mode, int & fused_node_unique_id, const onnxruntime::layout_transformation::TransformLayoutFunction & transform_layout_fn, const onnxruntime::layout_transformation::DebugGraphFn & debug_graph_fn, const onnxruntime::CheckLoadCancellationFn & check_load_cancellation_fn, const onnxruntime::logging::Logger & logger, onnxruntime::IResourceAccountant * resource_accountant, const onnxruntime::GraphOptimizerRegistry & graph_optimizer_registry, bool disable_model_compile) (/…/EP/src/ONNXRuntime/onnxruntime/core/framework/graph_partitioner.cc:476)
libonnxruntime.so.1!onnxruntime::PartitionOnnxFormatModel(const onnxruntime::(anonymous namespace)::PartitionParams & partition_params, onnxruntime::GraphPartitioner::Mode mode, const onnxruntime::ExecutionProviders & execution_providers, onnxruntime::KernelRegistryManager & kernel_registry_manager, const std::optional<onnxruntime::InlinedHashMap<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::unique_ptr<onnxruntime::IResourceAccountant, std::default_delete<onnxruntime::IResourceAccountant> >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::unique_ptr<onnxruntime::IResourceAccountant, std::default_delete<onnxruntime::IResourceAccountant> > > > > > & acc_map, const onnxruntime::GraphOptimizerRegistry & graph_optimizer_registry, const onnxruntime::logging::Logger & logger, bool disable_model_compile) (/…/EP/src/ONNXRuntime/onnxruntime/core/framework/graph_partitioner.cc:1032)
libonnxruntime.so.1!onnxruntime::GraphPartitioner::Partition(const onnxruntime::GraphPartitioner * const this, onnxruntime::Graph & graph, onnxruntime::FuncManager & func_mgr, const onnxruntime::layout_transformation::TransformLayoutFunction & transform_layout_function, const onnxruntime::ConfigOptions & config_options, const onnxruntime::logging::Logger & logger, onnxruntime::GraphPartitioner::Mode mode, const onnxruntime::epctx::ModelGenOptions & ep_context_gen_options, const onnxruntime::layout_transformation::DebugGraphFn & debug_graph_fn) (/…/EP/src/ONNXRuntime/onnxruntime/core/framework/graph_partitioner.cc:1316)
libonnxruntime.so.1!onnxruntime::InferenceSession::TransformGraph(onnxruntime::InferenceSession * const this, onnxruntime::Graph & graph, bool saving_model_in_ort_format) (/…/EP/src/ONNXRuntime/onnxruntime/core/session/inference_session.cc:1448)
libonnxruntime.so.1!onnxruntime::InferenceSession::Initialize(onnxruntime::InferenceSession * const this) (/…/EP/src/ONNXRuntime/onnxruntime/core/session/inference_session.cc:2286)
libonnxruntime.so.1!InitializeSession(const OrtSessionOptions * options, onnxruntime::InferenceSession & sess, OrtPrepackedWeightsContainer * prepacked_weights_container) (/…/EP/src/ONNXRuntime/onnxruntime/core/session/utils.cc:364)
libonnxruntime.so.1!OrtApis::CreateSessionFromArray(const OrtEnv * env, const void * model_data, size_t model_data_length, const OrtSessionOptions * options, OrtSession ** out) (/…/EP/src/ONNXRuntime/onnxruntime/core/session/onnxruntime_c_api.cc:745)
lib******ONNX.so!Ort::Session::Session(Ort::Session * const this, const Ort::Env & env, const void * model_data, size_t model_data_length, const Ort::SessionOptions & options) (/…/EP/include/onnxruntime/onnxruntime_cxx_inline.h:1810)
…
The bad access seems to be a null dereference:
-exec p/x $_siginfo._sifields._sigfault.si_addr
$1 = 0x0
The search_cluster
being iterated over appears to contain garbage.
-exec p search_cluster[0]
Cannot access memory at address 0xacb5eb01c2097be1
-exec p &search_cluster[0]
$3 = (unsigned long *) 0xacb5eb01c2097be1
It seems to be at a reasonable-looking location.
-exec p &search_cluster
$4 = (const std::vector<unsigned long, std::allocator<unsigned long> > *) 0x7f8eedcbcbd0
It doesn’t seem to be in range of the current thread’s stack.
-exec info reg rsp
rsp 0x7f9193ff1610 0x7f9193ff1610
But it is within a valid, mapped read-write address region for the process.
-exec info proc mappings
process 3711
Mapped address spaces:
Start Addr End Addr Size Offset Perms File
…
0x00007f8ee8000000 0x00007f8eebf01000 0x3f01000 0x0 rw-p
0x00007f8eebf01000 0x00007f8eec000000 0xff000 0x0 ---p
0x00007f8eec000000 0x00007f8eeff01000 0x3f01000 0x0 rw-p
0x00007f8eeff01000 0x00007f8ef0000000 0xff000 0x0 ---p
0x00007f8ef0000000 0x00007f8ef3f01000 0x3f01000 0x0 rw-p
0x00007f8ef3f01000 0x00007f8ef4000000 0xff000 0x0 ---p
0x00007f8ef4000000 0x00007f8ef7f01000 0x3f01000 0x0 rw-p
…
One level up the stack, where does search_cluster
come from? It’s connected_clusters[j]
. In this loop, total_clusters
is set by connected_clusters.size()
.
-exec p total_clusters
$7 = 4
However, j
is incremented after the test and before the call to AddTrivialClusterToNextClusterIfConnected
.
while (j < total_clusters && !append_node) {
j = j + 1;
append_node = AddTrivialClusterToNextClusterIfConnected(graph_viewer_, index, connected_clusters[j]);
}
At the time of the segfault, j
is one past the end of connected_clusters, so the vector has been overrun and we’re officially in undefined behavior.
-exec p j
$8 = 4
Proposed Fix
Adjust the loop condition to prevent j
from exceeding valid indices, and use .at()
for bounds-checked access. I’m successfully using the minimalist patch below on top of 1.23.1, but I was going for surgical rather than style points.
diff --git a/onnxruntime/core/providers/openvino/ov_versions/capability.cc b/onnxruntime/core/providers/openvino/ov_versions/capability.cc
index 1893700ca..b735d5d45 100644
--- a/onnxruntime/core/providers/openvino/ov_versions/capability.cc
+++ b/onnxruntime/core/providers/openvino/ov_versions/capability.cc
@@ -179,12 +179,12 @@ std::vector<std::unique_ptr<ComputeCapability>> GetCapability::Execute() {
omit_subgraph = false;
} else if (j < total_clusters - 1) {
bool append_node = false;
- while (j < total_clusters && !append_node) {
+ while (j + 1 < total_clusters && !append_node) {
j = j + 1;
- append_node = AddTrivialClusterToNextClusterIfConnected(graph_viewer_, index, connected_clusters[j]);
+ append_node = AddTrivialClusterToNextClusterIfConnected(graph_viewer_, index, connected_clusters.at(j));
}
if (append_node) {
- connected_clusters[j].emplace_back(index);
+ connected_clusters.at(j).emplace_back(index);
}
omit_subgraph = true;
}
To reproduce
It happened multiple consecutive times in a row with a specific custom ONNX model; I never saw this succeed unpatched with that model.
Urgency
We're up and running with the patch I included, so low urgency for me, but it does seem to involve undefined behavior and out-of-bounds access.
Platform
Linux
OS Version
Fedora 42
ONNX Runtime Installation
Built from Source
ONNX Runtime Version or Commit ID
1.23.1
ONNX Runtime API
C++
Architecture
X64
Execution Provider
OpenVINO
Execution Provider Library Version
OpenVINO 2025.3.0