-
Notifications
You must be signed in to change notification settings - Fork 733
Description
Hi,
It appears that operations are not uniformly distributed to nodes by the generator.
The first specified node, e.g. "n1", gets so many more operations that I thought it was worth quantifying and confirming that this is not desired behavior.
Using a test that is a minimal no-op:
{:db db/noop
:nemesis nemesis/noop
:generator (repeat {:type :invoke
:f :node
:value nil})}
with a client that just augments the op map with {:type :ok :value node}
and may sleep
--op-latency
ms to simulate latency:
(invoke!
[{:keys [node op-latency] :as _this} _test op]
; simulate any desired latency?
(when (< 0 op-latency)
(u/sleep op-latency))
; op is ok with a value of node
(assoc op
:type :ok
:value node))
and a checker that just calculates total ops by node:
(->> history
h/client-ops
(h/remove (fn [{:keys [type] :as _op}] (= type :invoke)))
(reduce (fn [summary {:keys [value] :as _op}]
(update summary value (fn [old] (+ 1 (or old 0)))))
(sorted-map)))
as --workload ops-by-node
yields results showing "n1" receiving almost 2x the operations as "n5":
{:valid? true,
:nodes ["n1" "n2" "n3" "n4" "n5"],
:ops-by-node {"n1" 1919,
"n2" 1009,
"n3" 978,
"n4" 977,
"n5" 977}}
Manually specifying nodes in reverse order reverses the skew to "n5" receiving almost 2x the operations as "n1":
--nodes n5,n4,n3,n2,n1
{:valid? true,
:nodes ["n5" "n4" "n3" "n2" "n1"],
:ops-by-node {"n1" 960,
"n2" 960,
"n3" 963,
"n4" 983,
"n5" 1895}}
Let's try adding a new workload, --workload odd-nodes-only
, only odd numbered nodes receive operations, with a generator of (gen/on-threads #{0 2 4})
(fn
is a set of the odd numbered nodes' corresponding threads):
{:valid? true,
:nodes ["n1" "n2" "n3" "n4" "n5"],
:ops-by-node {"n1" 1936,
"n3" 1952,
"n5" 1949}}
And another new workload, --workload on-threads-any
, all nodes receive operations, with a generator of (gen/on-threads any?)
(fn
any?
always returns true
):
{:valid? true,
:nodes ["n1" "n2" "n3" "n4" "n5"],
:ops-by-node {"n1" 1893,
"n2" 993,
"n3" 964,
"n4" 962,
"n5" 962}}
Let's introduce some client latency, --op-latency ms
, the amount of time, simulated latency, an op should take in ms:
--op-latency 15
{:valid? true,
:nodes ["n1" "n2" "n3" "n4" "n5"],
:ops-by-node {"n1" 1299,
"n2" 1240,
"n3" 1141,
"n4" 1102,
"n5" 1040}}
Now try a latency designed to remove a node from availability equal to all other nodes having a chance,
e.g. 1000 / rate * #-nodes:
--op-latency 50
{:valid? true,
:nodes ["n1" "n2" "n3" "n4" "n5"],
:ops-by-node {"n1" 1029,
"n2" 1021,
"n3" 1019,
"n4" 1008,
"n5" 991}}
Looking at:
; When we consume a thread, we bump the next thread index. This means we
; rotate evenly through threads instead of giving a single thread all the
; ops.
and the test:
; We want to distribute requests evenly across threads to prevent
; starvation.
would seem to confirm that this is not the desire behavior?
A repository, jepsen-skeleton, has been created to demonstrate this issue.