Open
Description
Program terminated with signal SIGABRT, Aborted.
#0 __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
44 ./nptl/pthread_kill.c: No such file or directory.
[Current thread is 1 (Thread 0x7faf267026c0 (LWP 16))]
#0 __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
#1 0x00007faf290dbe9f in __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
#2 0x00007faf2908cfb2 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3 0x00007faf29077472 in __GI_abort () at ./stdlib/abort.c:79
#4 0x00007faf29a09a4d in __rte_panic () from /usr/local/lib/x86_64-linux-gnu/librte_eal.so.24
#5 0x0000564aee2df5d3 in dp_ref_inc (ref=0x1249d94b08) at ../include/dp_refcount.h:36
#6 0x0000564aee2dfb0d in dp_process_ipv4_snat (snat_data=0x1249f49180, port=0x564afe4b4d40, cntrack=0x1249d94a40, df=0x11ea640a80, m=0x11ea640a00) at ../src/nodes/snat_node.c:74
#7 get_next_index (node=0x125a37f540, m=0x11ea640a00) at ../src/nodes/snat_node.c:175
#8 0x0000564aee2e0455 in dp_foreach_graph_packet (get_next_index=0x564aee2df60c <get_next_index>, speculated_node=1, nb_objs=1, objs=0x124833db80, node=0x125a37f540, graph=0x125a367700) at ../include/nodes/common_node.h:45
#9 snat_node_process (graph=0x125a367700, node=0x125a37f540, objs=0x124833db80, nb_objs=1) at ../src/nodes/snat_node.c:248
#10 0x0000564aee39b0d1 in __rte_node_process (node=0x125a37f540, graph=0x125a367700) at /usr/local/include/rte_graph_worker_common.h:186
#11 rte_graph_walk_rtc (graph=0x125a367700) at /usr/local/include/rte_graph_model_rtc.h:42
#12 0x0000564aee39b41d in rte_graph_walk (graph=0x125a367700) at /usr/local/include/rte_graph_worker.h:38
#13 0x0000564aee39b88a in graph_main_loop (arg=0x0) at ../src/dpdk_layer.c:117
#14 0x00007faf29a1e1b6 in eal_thread_loop () from /usr/local/lib/x86_64-linux-gnu/librte_eal.so.24
#15 0x00007faf29a2fe09 in eal_worker_thread_loop () from /usr/local/lib/x86_64-linux-gnu/librte_eal.so.24
#16 0x00007faf290da144 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#17 0x00007faf2915a7dc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
This has happened multiple times in OSC. Looking at the code, this is caused by improper order of operations in snat_node.c:74
dp_delete_flow()
causesdp_ref_dec()
which can possibly go to zero- counter being zero causes freeing of resources
- a new flow is create to replace the (already deleted) one
dp_ref_inc()
is then called on a freed-up reference
I have created a temporary fix for OSC that simply changes the order to:
dp_ref_inc()
dp_delete_flow()
- only then create and replace the flow
- if this creation fails,
dp_ref_dec()
is needed to revert the previous increase
Now I stand by this order of operations, but I am also aware, that the situation should never happen, as there should always be at least 2 references for a flow. But from a local code review the order simply should be done this way to avoid confusion.
The next question is, why the situation has arisen, because I am simply curing the symptom and not a cause. This is still ongoing in OSC.
I have not yet created a PR because I think this can have better solutions and some discussion is surely needed before doing any big changes.
Metadata
Metadata
Labels
Type
Projects
Status
OnHold
Status
No status