Description
In a multi-process setup, the ipfix probe
program occasionally crashes on startup with an assertion failure in the PFLUA optimizier. Here is an excerpt of such an event:
../lib/pflua/src/pf/optimize.lua:0: attempt to compare string with number
Stack Traceback
===============
(1) Lua metamethod '__lt' at file 'core/main.lua:172'
Local variables:
(*temporary) = string: "../lib/pflua/src/pf/optimize.lua:0: attempt to compare string with number"
(2) Lua upvalue 'd' at file '../lib/pflua/src/pf/optimize.lua:0'
Local variables:
(*temporary) = string: "[]"
(*temporary) = string: "[]"
(*temporary) = nil
(*temporary) = nil
(*temporary) = nil
(*temporary) = string: "attempt to compare string with number"
(3) Lua upvalue '' at file '../lib/pflua/src/pf/optimize.lua:93'
Local variables:
(*temporary) = table: 0x7fedcb989ff8 {1:[], 2:23, 3:1}
(*temporary) = table: 0x7fedcaae5118 {1:(}
(*temporary) = number: 1
(*temporary) = number: 3
(*temporary) = number: 1
(*temporary) = number: 1
(*temporary) = number: 2
(4) Lua upvalue '' at file '../lib/pflua/src/pf/optimize.lua:78'
Local variables:
(*temporary) = table: 0x7fedcb989ff8 {1:[], 2:23, 3:1}
(*temporary) = nil
...
The problem disappears when the memoize()
wrapper is removed at
snabb/lib/pflua/src/pf/optimize.lua
Line 89 in de5aaa3
The function cfkey()
takes a Lua table (expr
) as input and that table is used as the lookup key in the cfkey_cache
table. The validity of the memoization relies on two assumptions about expr
- it's immutable
- it never goes out of scope between calls to
cfkey()
The first assumption must hold because only the address of the expr
table is used as key into the cache. The second assumption guarantees that the table is not garbage-collected and re-used between calls. It seems that at least the second assumption is violated. To confirm this, modify
snabb/lib/pflua/src/pf/optimize.lua
Line 808 in de5aaa3
by inserting an explicit GC run
function optimize_inner(expr)
expr = simplify(expr, true)
expr = simplify(cfold(expr, {}), true)
collectgarbage()
expr = simplify(infer_ranges(expr), true)
expr = simplify(lhoist(expr), true)
clear_cache()
return expr
end
The problem can then be triggered by the following script
pf = require("pf")
local filter = "(ip or ip6) and tcp and (dst port 80 or dst port 443 or dst port 8443)"
local foo = {}
for n = 1, 300 do
foo[#foo+1] = pf.compile_filter(filter)
end
On my system, the assertion failure occurs in roughly 1 out of 5 runs of this program.
The issue should really be opened in https://github.com/Igalia/pflua but that code appears to be abandoned.