Skip to content

Bug: orchagent crashes at addRoutePost while running test_vrf.py #22642

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
manoharan-nexthop opened this issue May 19, 2025 · 0 comments · May be fixed by sonic-net/sonic-swss#3652
Open

Bug: orchagent crashes at addRoutePost while running test_vrf.py #22642

manoharan-nexthop opened this issue May 19, 2025 · 0 comments · May be fixed by sonic-net/sonic-swss#3652
Assignees
Labels
Bug 🐛 nexthop-ai Triaged this issue has been triaged

Comments

@manoharan-nexthop
Copy link

Is it platform specific

generic

Importance or Severity

Critical

Description of the bug

VRF test test_vrf.py as the orchagent crashes with the below backtrace

(gdb) bt
#0  0x00007feb65466eec in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007feb65417fb2 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007feb65402472 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#3  0x00007feb6575a919 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x00007feb65765e1a in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x00007feb65765e85 in std::terminate() () from /lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x00007feb657660d8 in __cxa_throw () from /lib/x86_64-linux-gnu/libstdc++.so.6
#7  0x00007feb6575d240 in std::__throw_out_of_range(char const*) () from /lib/x86_64-linux-gnu/libstdc++.so.6
#8  0x00005557f228d966 in std::map<unsigned long, std::map<swss::IpPrefix, RouteNhg, std::less<swss::IpPrefix>, std::allocator<std::pair<swss::IpPrefix const, RouteNhg> > >, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, std::map<swss::IpPrefix, RouteNhg, std::less<swss::IpPrefix>, std::allocator<std::pair<swss::IpPrefix const, RouteNhg> > > > > >::at (this=<optimized out>, __k=<optimized out>) at /usr/include/c++/12/bits/stl_map.h:551
#9  0x00005557f2284bfb in RouteOrch::addRoutePost (this=this@entry=0x5557f2f75250, ctx=..., nextHops=...) at ./orchagent/routeorch.cpp:2145
#10 0x00005557f228b0c2 in RouteOrch::doTask (this=0x5557f2f75250, consumer=...) at ./orchagent/routeorch.cpp:1021
#11 0x00005557f22482e2 in Orch::doTask (this=0x5557f2f75250) at ./orchagent/orch.cpp:553
#12 0x00005557f22390aa in OrchDaemon::start (this=this@entry=0x5557f2ed2ec0) at ./orchagent/orchdaemon.cpp:895
#13 0x00005557f21a5642 in main (argc=<optimized out>, argv=<optimized out>) at ./orchagent/main.cpp:818
(gdb) frame 9
#9  0x00005557f2284bfb in RouteOrch::addRoutePost (this=this@entry=0x5557f2f75250, ctx=..., nextHops=...) at ./orchagent/routeorch.cpp:2145
2145	./orchagent/routeorch.cpp: No such file or directory.
(gdb) p m_syncdRoutes
$1 = std::map with 1 element = {[844424930132003] = std::map with 2 elements = {[{m_ip = {m_ip = {family = 2 '\002', ip_addr = {ipv4_addr = 0, ipv6_addr = '\000' <repeats 15 times>}}}, m_mask = 0}] = {nhg_key = {m_nexthops = std::set with 0 elements,
        m_overlay_nexthops = false, m_srv6_nexthops = false}, nhg_index = ""}, [{m_ip = {m_ip = {family = 10 '\n', ip_addr = {ipv4_addr = 0, ipv6_addr = '\000' <repeats 15 times>}}}, m_mask = 0}] = {nhg_key = {m_nexthops = std::set with 0 elements,
        m_overlay_nexthops = false, m_srv6_nexthops = false}, nhg_index = ""}}}
(gdb) p vrf_id
$2 = (const sai_object_id_t &) @0x5557f3078b20: 844424930133543
(gdb) p ctx
$3 = (const RouteBulkContext &) @0x5557f3078a40: {object_statuses = std::deque with 1 element = {0}, tmp_next_hop = {m_nexthops = std::set with 0 elements, m_overlay_nexthops = false, m_srv6_nexthops = false}, nhg = {m_nexthops = std::set with 1 element = {[0] = {
        ip_address = {m_ip = {family = 2 '\002', ip_addr = {ipv4_addr = 0, ipv6_addr = "\000\000\000\000\200W\215e\353\177\000\000\020W\215e"}}}, alias = "Vlan1000", vni = 0, mac_address = {m_mac = "\000\000\000\000\000"}, label_stack = {
          m_labelstack = std::vector of length 0, capacity 0, m_outseg_type = SAI_OUTSEG_TYPE_SWAP}, weight = 0, srv6_segment = "", srv6_source = ""}}, m_overlay_nexthops = false, m_srv6_nexthops = false}, nhg_index = "", vrf_id = 844424930133543, ip_prefix = {m_ip = {
      m_ip = {family = 2 '\002', ip_addr = {ipv4_addr = 43200, ipv6_addr = "\300\250\000\000 \377\377\377\377\377\377\377\024\000\000"}}}, m_mask = 21}, excp_intfs_flag = false, using_temp_nhg = false, key = "Vrf1:192.168.0.0/21", protocol = "kernel", is_set = true}
(gdb)

The route addition gets attempted before the VRF gets added without waiting for the VRF to be updated and resulting in lookup failure.

Steps to Reproduce

Run the test:

python3 -m pytest vrf/test_vrf.py --inventory ../ansible/lab --host-pattern dut --dpu-pattern None --testbed tb1 --testbed_file ../ansible/testbed.yaml --log-cli-level warning --log-file-level debug --kube_master unset --showlocals --assert plain --show-capture no -rav --allow_recover --ignore=ptftests --ignore=acstests --ignore=saitests --ignore=scripts --ignore=k8s --ignore=sai_qualify --junit-xml=/tmp/test-logs/tr.xml --log-file=/tmp/test-logs/test.log --topology t0 --skip_sanity --collect_techsupport=False --disable-pytest-warnings --neighbor_type=sonic

and notice the test fails and the dut has swss container down with the orchagent crash core file found in /var/core.

Actual Behavior and Expected Behavior

RouteOrch::addRoutePost should wait for the VRF to be created and RouteOrch::addRoute creates a OID entry in m_syncdRoutes, and then it could program the routes. Till that happens, the route add should get deferred.

Relevant log output

Output of show version, show techsupport

Attach files (if any)

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug 🐛 nexthop-ai Triaged this issue has been triaged
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants