generator does not uniformly distribute operations to nodes

Hi,

It appears that operations are not uniformly distributed to nodes by the generator.
The first specified node, e.g. "n1", gets ***so*** many more operations that I thought it was worth quantifying and confirming that this is not desired behavior.

Using a test that is a minimal no-op:

```clj
{:db       db/noop
:nemesis   nemesis/noop
:generator (repeat {:type  :invoke
                    :f     :node
                    :value nil})}
```

with a client that just augments the op map with `{:type :ok :value node}` and may `sleep` `--op-latency` ms to simulate latency:

```clj
(invoke!
  [{:keys [node op-latency] :as _this} _test op]
  ; simulate any desired latency?
  (when (< 0 op-latency)
    (u/sleep op-latency))  
  
  ; op is ok with a value of node
  (assoc op
         :type  :ok
         :value node))
```

and a checker that just calculates total ops by node:

```clj
(->> history
     h/client-ops
     (h/remove (fn [{:keys [type] :as _op}] (= type :invoke)))
     (reduce (fn [summary {:keys [value] :as _op}]
               (update summary value (fn [old] (+ 1 (or old 0)))))
             (sorted-map)))
```
as `--workload ops-by-node` yields results showing "n1" receiving almost 2x the operations as "n5":

```clj
{:valid? true,
 :nodes ["n1" "n2" "n3" "n4" "n5"],
 :ops-by-node {"n1" 1919,
               "n2" 1009,
               "n3" 978,
               "n4" 977,
               "n5" 977}}
```

Manually specifying nodes in reverse order reverses the skew to "n5" receiving almost 2x the operations as "n1":

`--nodes n5,n4,n3,n2,n1`

```clj
{:valid? true,
 :nodes ["n5" "n4" "n3" "n2" "n1"],
 :ops-by-node {"n1" 960,
               "n2" 960,
               "n3" 963,
               "n4" 983,
               "n5" 1895}}
```

Let's try adding a new workload, `--workload odd-nodes-only`, only odd numbered nodes receive operations, with a generator of `(gen/on-threads #{0 2 4})` (`fn` is a set of the odd numbered nodes' corresponding threads):

```clj
{:valid? true,
 :nodes ["n1" "n2" "n3" "n4" "n5"],
 :ops-by-node {"n1" 1936,
               "n3" 1952,
               "n5" 1949}}
```

And another new workload, `--workload on-threads-any`, all nodes receive operations, with a generator of `(gen/on-threads any?)` (`fn` `any?` always returns `true`):  

```clj
{:valid? true,
 :nodes ["n1" "n2" "n3" "n4" "n5"],
 :ops-by-node {"n1" 1893,
               "n2" 993,
               "n3" 964,
               "n4" 962,
               "n5" 962}}
```

Let's introduce some client latency, `--op-latency ms`, the amount of time, simulated latency, an op should take in ms:

`--op-latency 15`

```clj
{:valid? true,
 :nodes ["n1" "n2" "n3" "n4" "n5"],
 :ops-by-node {"n1" 1299,
               "n2" 1240,
               "n3" 1141,
               "n4" 1102,
               "n5" 1040}}
```

Now try a latency designed to remove a node from availability equal to all other nodes having a chance,
e.g. 1000 / rate * #-nodes:

`--op-latency 50`

```clj
{:valid? true,
 :nodes ["n1" "n2" "n3" "n4" "n5"],
 :ops-by-node {"n1" 1029,
               "n2" 1021,
               "n3" 1019,
               "n4" 1008,
               "n5" 991}}
```

----

Looking at:
- [generator/context/some-free-process](https://github.com/jepsen-io/jepsen/blob/main/jepsen/src/jepsen/generator/context.clj#L202)
- [generator/context/busy-thread](https://github.com/jepsen-io/jepsen/blob/main/jepsen/src/jepsen/generator/context.clj#L227)
```clj
; When we consume a thread, we bump the next thread index. This means we
; rotate evenly through threads instead of giving a single thread all the
; ops.
```

and the test:
- ["fair"](https://github.com/jepsen-io/jepsen/blob/main/jepsen/test/jepsen/generator/context_test.clj#L92)
```clj
; We want to distribute requests evenly across threads to prevent
; starvation.
```

would seem to confirm that this is not the desire behavior?

A repository, [jepsen-skeleton](https://github.com/nurturenature/jepsen-skeleton), has been created to demonstrate this issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

generator does not uniformly distribute operations to nodes #611

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

generator does not uniformly distribute operations to nodes #611

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions