Skip to content

Out-of-bounds Access in OpenVINO Capability Checker Causes Segfault #26284

@toms-g2

Description

@toms-g2

Describe the issue

Summary

The OpenVINO execution provider on Linux crashes with a segmentation fault because of an off-by-one error in capability.cc. The bug causes iteration past the end of the connected_clusters vector, resulting in undefined behavior when dereferencing an invalid memory location.

Environment

  • Platform: Linux (built from source)
  • Versions affected: 1.23.0, 1.23.1 (code unchanged between versions)
  • Component: libonnxruntime_providers_openvino.so

Root Cause

In onnxruntime/core/providers/openvino/ov_versions/capability.cc (line ~182), this loop increments j before accessing connected_clusters[j]:

while (j < total_clusters && !append_node) {
    j = j + 1;  // Incremented BEFORE use
    append_node = AddTrivialClusterToNextClusterIfConnected(
        graph_viewer_, index, connected_clusters[j]);  // j is now out of bounds
}

When j == total_clusters - 1 and the loop condition is true, j becomes total_clusters, causing connected_clusters[j] to access one element past the vector's end.

Evidence

While attempting to use the OpenVINO execution provider on Linux, I encountered a segfault. The original occurrence and details below pertain to 1.23.0, but I showed that the proposed fix is effective with 1.23.1 and the relevant code is unchanged between these two versions. We build from source and run on both Windows and Linux.

Stack trace:

libonnxruntime_providers_openvino.so!onnxruntime::openvino_ep::AddTrivialClusterToNextClusterIfConnected(const onnxruntime::GraphViewer & graph_viewer, const onnxruntime::NodeIndex curr_node_index, const std::vector<unsigned long, std::allocator<unsigned long> > & search_cluster) (/…/EP/src/ONNXRuntime/onnxruntime/core/providers/openvino/ov_versions/utils.cc:159)
libonnxruntime_providers_openvino.so!onnxruntime::openvino_ep::GetCapability::Execute(onnxruntime::openvino_ep::GetCapability * const this) (/…/EP/src/ONNXRuntime/onnxruntime/core/providers/openvino/ov_versions/capability.cc:184)
libonnxruntime_providers_openvino.so!onnxruntime::openvino_ep::OpenVINOExecutionProvider::GetCapability(const onnxruntime::openvino_ep::OpenVINOExecutionProvider * const this, const onnxruntime::GraphViewer & graph_viewer) (/…/EP/src/ONNXRuntime/onnxruntime/core/providers/openvino/openvino_execution_provider.cc:85)
libonnxruntime.so.1!operator()(const struct {...} * const __closure, const onnxruntime::IExecutionProvider & ep, const onnxruntime::GraphViewer & graph_viewer, const onnxruntime::IExecutionProvider::IKernelLookup & kernel_lookup, onnxruntime::IResourceAccountant * resource_accountant, const onnxruntime::GraphOptimizerRegistry & graph_optimizer_registry) (/…/EP/src/ONNXRuntime/onnxruntime/core/framework/graph_partitioner.cc:159)
libonnxruntime.so.1!onnxruntime::GetCapabilityForEP(const onnxruntime::(anonymous namespace)::GetCapabilityForEPParams & params, const onnxruntime::logging::Logger & logger) (/…/EP/src/ONNXRuntime/onnxruntime/core/framework/graph_partitioner.cc:197)
libonnxruntime.so.1!onnxruntime::PartitionOnnxFormatModelImpl(onnxruntime::Graph & graph, onnxruntime::FuncManager & func_mgr, onnxruntime::KernelRegistryManager & kernel_registry_mgr, onnxruntime::KernelRegistry & fused_kernel_registry, onnxruntime::IExecutionProvider & current_ep, onnxruntime::GraphPartitioner::Mode mode, int & fused_node_unique_id, const onnxruntime::layout_transformation::TransformLayoutFunction & transform_layout_fn, const onnxruntime::layout_transformation::DebugGraphFn & debug_graph_fn, const onnxruntime::CheckLoadCancellationFn & check_load_cancellation_fn, const onnxruntime::logging::Logger & logger, onnxruntime::IResourceAccountant * resource_accountant, const onnxruntime::GraphOptimizerRegistry & graph_optimizer_registry, bool disable_model_compile) (/…/EP/src/ONNXRuntime/onnxruntime/core/framework/graph_partitioner.cc:476)
libonnxruntime.so.1!onnxruntime::PartitionOnnxFormatModel(const onnxruntime::(anonymous namespace)::PartitionParams & partition_params, onnxruntime::GraphPartitioner::Mode mode, const onnxruntime::ExecutionProviders & execution_providers, onnxruntime::KernelRegistryManager & kernel_registry_manager, const std::optional<onnxruntime::InlinedHashMap<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::unique_ptr<onnxruntime::IResourceAccountant, std::default_delete<onnxruntime::IResourceAccountant> >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::unique_ptr<onnxruntime::IResourceAccountant, std::default_delete<onnxruntime::IResourceAccountant> > > > > > & acc_map, const onnxruntime::GraphOptimizerRegistry & graph_optimizer_registry, const onnxruntime::logging::Logger & logger, bool disable_model_compile) (/…/EP/src/ONNXRuntime/onnxruntime/core/framework/graph_partitioner.cc:1032)
libonnxruntime.so.1!onnxruntime::GraphPartitioner::Partition(const onnxruntime::GraphPartitioner * const this, onnxruntime::Graph & graph, onnxruntime::FuncManager & func_mgr, const onnxruntime::layout_transformation::TransformLayoutFunction & transform_layout_function, const onnxruntime::ConfigOptions & config_options, const onnxruntime::logging::Logger & logger, onnxruntime::GraphPartitioner::Mode mode, const onnxruntime::epctx::ModelGenOptions & ep_context_gen_options, const onnxruntime::layout_transformation::DebugGraphFn & debug_graph_fn) (/…/EP/src/ONNXRuntime/onnxruntime/core/framework/graph_partitioner.cc:1316)
libonnxruntime.so.1!onnxruntime::InferenceSession::TransformGraph(onnxruntime::InferenceSession * const this, onnxruntime::Graph & graph, bool saving_model_in_ort_format) (/…/EP/src/ONNXRuntime/onnxruntime/core/session/inference_session.cc:1448)
libonnxruntime.so.1!onnxruntime::InferenceSession::Initialize(onnxruntime::InferenceSession * const this) (/…/EP/src/ONNXRuntime/onnxruntime/core/session/inference_session.cc:2286)
libonnxruntime.so.1!InitializeSession(const OrtSessionOptions * options, onnxruntime::InferenceSession & sess, OrtPrepackedWeightsContainer * prepacked_weights_container) (/…/EP/src/ONNXRuntime/onnxruntime/core/session/utils.cc:364)
libonnxruntime.so.1!OrtApis::CreateSessionFromArray(const OrtEnv * env, const void * model_data, size_t model_data_length, const OrtSessionOptions * options, OrtSession ** out) (/…/EP/src/ONNXRuntime/onnxruntime/core/session/onnxruntime_c_api.cc:745)
lib******ONNX.so!Ort::Session::Session(Ort::Session * const this, const Ort::Env & env, const void * model_data, size_t model_data_length, const Ort::SessionOptions & options) (/…/EP/include/onnxruntime/onnxruntime_cxx_inline.h:1810)
…

The bad access seems to be a null dereference:

-exec p/x $_siginfo._sifields._sigfault.si_addr
$1 = 0x0

The search_cluster being iterated over appears to contain garbage.

-exec p search_cluster[0]
Cannot access memory at address 0xacb5eb01c2097be1
-exec p &search_cluster[0]
$3 = (unsigned long *) 0xacb5eb01c2097be1

It seems to be at a reasonable-looking location.

-exec p &search_cluster
$4 = (const std::vector<unsigned long, std::allocator<unsigned long> > *) 0x7f8eedcbcbd0

It doesn’t seem to be in range of the current thread’s stack.

-exec info reg rsp
rsp            0x7f9193ff1610      0x7f9193ff1610

But it is within a valid, mapped read-write address region for the process.

-exec info proc mappings
process 3711
Mapped address spaces:

Start Addr         End Addr           Size               Offset             Perms File 
…
0x00007f8ee8000000 0x00007f8eebf01000 0x3f01000          0x0                rw-p   
0x00007f8eebf01000 0x00007f8eec000000 0xff000            0x0                ---p   
0x00007f8eec000000 0x00007f8eeff01000 0x3f01000          0x0                rw-p   
0x00007f8eeff01000 0x00007f8ef0000000 0xff000            0x0                ---p   
0x00007f8ef0000000 0x00007f8ef3f01000 0x3f01000          0x0                rw-p   
0x00007f8ef3f01000 0x00007f8ef4000000 0xff000            0x0                ---p   
0x00007f8ef4000000 0x00007f8ef7f01000 0x3f01000          0x0                rw-p   
…

One level up the stack, where does search_cluster come from? It’s connected_clusters[j]. In this loop, total_clusters is set by connected_clusters.size().

-exec p total_clusters
$7 = 4

However, j is incremented after the test and before the call to AddTrivialClusterToNextClusterIfConnected.

          while (j < total_clusters && !append_node) {
            j = j + 1;
            append_node = AddTrivialClusterToNextClusterIfConnected(graph_viewer_, index, connected_clusters[j]);
          }

At the time of the segfault, j is one past the end of connected_clusters, so the vector has been overrun and we’re officially in undefined behavior.

-exec p j
$8 = 4

Proposed Fix

Adjust the loop condition to prevent j from exceeding valid indices, and use .at() for bounds-checked access. I’m successfully using the minimalist patch below on top of 1.23.1, but I was going for surgical rather than style points.

diff --git a/onnxruntime/core/providers/openvino/ov_versions/capability.cc b/onnxruntime/core/providers/openvino/ov_versions/capability.cc
index 1893700ca..b735d5d45 100644
--- a/onnxruntime/core/providers/openvino/ov_versions/capability.cc
+++ b/onnxruntime/core/providers/openvino/ov_versions/capability.cc
@@ -179,12 +179,12 @@ std::vector<std::unique_ptr<ComputeCapability>> GetCapability::Execute() {
           omit_subgraph = false;
         } else if (j < total_clusters - 1) {
           bool append_node = false;
-          while (j < total_clusters && !append_node) {
+          while (j + 1 < total_clusters && !append_node) {
             j = j + 1;
-            append_node = AddTrivialClusterToNextClusterIfConnected(graph_viewer_, index, connected_clusters[j]);
+            append_node = AddTrivialClusterToNextClusterIfConnected(graph_viewer_, index, connected_clusters.at(j));
           }
           if (append_node) {
-            connected_clusters[j].emplace_back(index);
+            connected_clusters.at(j).emplace_back(index);
           }
           omit_subgraph = true;
         }

To reproduce

It happened multiple consecutive times in a row with a specific custom ONNX model; I never saw this succeed unpatched with that model.

Urgency

We're up and running with the patch I included, so low urgency for me, but it does seem to involve undefined behavior and out-of-bounds access.

Platform

Linux

OS Version

Fedora 42

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

1.23.1

ONNX Runtime API

C++

Architecture

X64

Execution Provider

OpenVINO

Execution Provider Library Version

OpenVINO 2025.3.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    ep:OpenVINOissues related to OpenVINO execution provider

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions