Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RLN Fees distribution to operators with root voting #101

Open
alrevuelta opened this issue Sep 9, 2024 · 31 comments
Open

RLN Fees distribution to operators with root voting #101

alrevuelta opened this issue Sep 9, 2024 · 31 comments

Comments

@alrevuelta
Copy link
Contributor

alrevuelta commented Sep 9, 2024

TLDR: Proposal on how to distribute RLN membership fees to the node operators that submit the correct Merkle root
of the messages sent during a superepoch.

Problem statement

Waku has chosen Rate Limiting Nullifiers to rate-limit users, ensuring they do a fair usage of the network resources.
In order to do so, it requires users to pay for the so-called RLN membership, which entitles the holder to send a given
amount of messages per unit of time while its active.

However at the time of writting, the collected fees from RLN memberships don't have a designated purpose.
In order to ensure the sustainability of the network and reward node operators for their work, one option would be
to distribute them among node operators.

We define node operators as entities that participate in the network, running nodes and relaying traffic. They are
the ones that make the network work. A node operator is also a registered entity known by the waku protocol,
see more information below.

This issue proposes how to distribute the generated rewards by RLN memberships to node operadors. It does so by
rewarding the node operators that vote on the correct Merkle root for each superepoch. The root agreed by a 2/3
supermajority is considered the corerct one, and the nodes that voted on that get a reward.
This reward is a fraction of the fees that were generated from membership registration during that superepoch.
Since gossipsub scoring punishes lazy peers and incentivices peers to behave properly, we assume that knowing the valid
Merkle root implies having participated in relay protocol, which is the behaviour we want to incentivice.

Proposal

This proposal introduces the following concepts:

    1. Node registration: Require node operators to register with a sybil-resistant mechanisim.
    1. Superepoch: Defining a superepoch that matches the RLN membership renovation cycle.
    1. Messages Merkle Tree: A Merkle tree with all messages within a superepoch.
    1. Commit-reveal: At the end of the superepoch each operator can submit their local Merkle root of the Merkle tree containing all messages,
      using a commit-reveal mechanisim to prevent lazy nodes from copying honest ones.
    1. Claim: The Merkle root voted by the majority becomes the accepted root for that superepoch, and every node is given /n
      the rewards from the subscription fees of that superepoch, where n is the amount of nodes that matched the majority.

Below, each point is ellaborated.

1. Node registration

Since anyone would be able to submit a vote on the Merkle root, we need some kind of pseudo-identity to sybil-protect the voting.
Otherwise the same entity knowing a valid Merkle root, defined as the one that represent a Merkle tree constructed by all messages within a superepoch
could vote multiple times, getting more rewards than it should.

A suggested mechanisim is to require a STAKE_AMOUNT for node operators, which provides an economic sybil-protection.
When an operator stakes this amount, it becomes eligible to participate in the voting and hence, eligible for rewards.
When the operator no longer wants to participate, it will be returned.
The stake can be slashed if they dont vote in SUPEREPOCHS_PENALTY consecutive superepochs.
Its yet to be defined if there would be other slashing conditions.

2. Superepoch

A superepoch is the time window used for voting. During each superepoch nodes construct a Merkle tree with all seen messages
and once finished, the Merkle root is calculated.

The superepoch matches the "billing period", see comment.
In other words, if 10 users paid an RLN membership for a given superepoch then all this generated fees are distributes
among the nodes operators that participated in this superepoch.

3. Messages Merkle Tree

For each superepoch node operators construct a Merkle tree with all the messages they have:

  • Seen in the network. With the help of gossipsub scoring, we assume that a node that is able to see messages in the network is because its also relaying them.
    Otherwise its score would be penalized.
  • Got from store and store sync. We acknowledge that nodes may have some downtime, so its accepted that some of these messages will come from store and not relay.
    A protection shall be added to discard "lazy nodes", meaning that they don't participate in relay but manage to get a valid Merkle root.

4. Commit-reveal

Since all onchain activity is public and anyone can see it, we propose a commit-reveal mechanisim to prevent lazy nodes
from sniffing the blockchain and copying the Merkle root of honest nodes.

Once the superepoch finishes, a window of COMMIT_WINDOW_SLOTS opens, during which all registered nodes can
submit their vote on which root they consider to be valid for that superepoch. What the node operators commits to (aka commitment)
is a secret, in this case the root. The commitment is hash(root + salt). That would be public but since getting the preimage
of that hash is not possible, an attacker won't be able to reveal in the next phase.

Once the COMMIT_WINDOW_SLOTS is due, no more votes are accepted for that superepoch and a REVEAL_WINDOW_SLOTS window is opened.
During this window node operators can reveal their secret. If the provided root during the reveal phase matches hash(root + salt), then their
vote would be accepted.

Finally, once REVEAL_WINDOW_SLOTS is due eligible node operators can claim their share of the reward. See next section.

5. Claim

After REVEAL_WINDOW_SLOTS the commit-reveal phase is done and votes are counted. An example:

  • node0..4 (5) voted on root root1.
  • node5..24 (20) voted on root root2.
  • node25 (1) voted on root root3.
  • note26..30 (4) did not vote.

As explained before, node operators are registered beforehand in the contract a priori so the total amount of votes are known.
The previous votes can be written as [num_votes, root_n] or [5, root1], [20, root2], [1, root3], [4, NaN].
With the votes, a simple 2/3 consensus of a supermajority of 66% is used to decide which root to accept.

The weights would be:

  • root1 5/30 = 16%
  • root2 20/30 = 66%
  • root3 1/30 = 3%

This means that root2 is accepted onchain as the Merkle root of all the messages seen during the superepoch.
Nodes that voted for that root are eligible for rewards.
The amount to be distributed is calculated as the amount of memberships * price paid for each membership for each superepoch.
If 100 active memberships were active and each one paid 10 tokens/superepoch, then the amount to spit is: 100 * 10 = 1000 tokens.

Membership fees would be split among the nodes that voted the agreed root, aka root2 meaning, 1000/20 = 50 tokens for node5..24.

Note that a node operator that does not vote during SUPEREPOCHS_PENALTY can be slashed by anyone, as this information is accesible onchain
by anyone. The whistleblower will get WHISTEBLOWER_REWARD which is <=STAKE_REQUIREMENT and the node will no longer be part of the
active node operators.

In order to make it gas efficient, each node operators rewards can be claimed at any time.
For each superepoch the claimable amount keep accumulating, so that the operator can claim their rewards of multiple superepochs at once.

Consensus

An interesting side effect of this proposal is that it introduces an onchain consensus of the messages sent for each superepoch.
This would allow light clients (eg filter) to verify that a given message was indeed sent through the network, since the onchain root
can be used.

Open problems

This proposal has some open problems:

  • Handle lazy nodes, defined as entities not participating in relay but somehow getting a valid Merkle root.
    They should not be eligible for rewards.
  • Handle nodes registering multiple stakes. An operator running just one node could stake STAKE_AMOUNT for multiple node operators.
    This will allow it to vote multiple times but its only doing the work of one node.
@SionoiS
Copy link

SionoiS commented Sep 9, 2024

Just to state the obvious, this assumes that we collect fees otherwise it can't work long term.

we need some kind of pseudo-identity to sybil-protect the voting

Do you think it would be economically viable to register 2/3 of ALL possible memberships and control the vote to get all the fees?

A protection shall be added to discard "lazy nodes", meaning that they don't participate in relay but manage to get a valid Merkle root.

Waku Sync could be used to get the correct Merkle root without using Relay at all. I'm not sure we can prevent that. 🤔

a simple 2/3 consensus of a supermajority of 66% is used to decide which root to accept.

What happen when no majority? Do fees just accumulate?

Note that a node operator that does not vote during SUPEREPOCHS_PENALTY can be slashed by anyone

To me it feels unnecessary, is the lack of reward not penalty enough? It would allow superepoch to not have votes and for rewards to accumulate. It would be good to have a mechanism that can allow skipping superepoch when the fees are not high enough but maybe it makes the system too gameable?

Maybe instead of voting, let people gamble on the correct merkle root? Stake/Vote -> Bet, Reward -> odds of wining.
No idea how to determine the truth though... 🤔 🤷

@alrevuelta
Copy link
Contributor Author

Do you think it would be economically viable to register 2/3 of ALL possible memberships and control the vote to get all the fees?

This is one of the issues that "1. Node registration" should address. In Ethereum this was solved by requiring a minimum amount of operators to stake first. Without this minimum amount of operators MIN_GENESIS_ACTIVE_VALIDATOR_COUNT, the protocol could not launch. Since the participation was open to anyone, and the stake was considerable, the diversity was quite good, but at some point an entity controlled around 30%.

Waku Sync could be used to get the correct Merkle root without using Relay at all. I'm not sure we can prevent that. 🤔

Perhaps with a rate limit. Its okay to get some messages via waku sync (eg if you are offline for some time) but it shouldn't be possible to purely rely on waku sync. Another option (largely vague) is to mix service incentivization here. Meaning store sync will only work if you pay a small fee (see service incentivization).

What happen when no majority? Do fees just accumulate?

No majority can happen in two scenarios: i) nodes strongly disagreeing in the root or ii) a bunch of nodes not voting. For ii) we can heavily penalize that since that hinders consensus. For i) tbh to be defined.

To me it feels unnecessary, is the lack of reward not penalty enough? It would allow superepoch to not have votes and for rewards to accumulate.

I would say penalizing here mitigates the ii) from the previous point.

@fryorcraken
Copy link
Contributor

Looks good to me. Very exciting.

Yes, there are some unknown and risks. Let's do the work so we can confirm viability (or not) of such a protocol.

One potential issue is ephemeral messages. They should probably be discarded from the Merkle tree as it is not possible to use store/store sync to double check none were missed

Few thoughts regarding gaming this by relying on store sync:

  • If the superepoch is greater than the retention of most store, then it would not make it possible for someone to step in and back track enough to get the merkle root.

  • Hence, one would need to use store sync on a regular basis to sync all messages (instead of using RLN Relay).
    I wonder if purely by having some sane rate limits on store, this can be prevented.

  • Also, I believe store currently does not store RLN proofs. So there may be some potential issue here, when doing store sync, how can you be certain that the messages you are given are legitimate?

  • Maybe getting the RLN proof with store queries could be a more expensive service, app users should not need that, but those participating in this protocol would want to be sure the message should be included.

@chaitanyaprem
Copy link

chaitanyaprem commented Sep 10, 2024

I like the overall approach as it resembles very closely of the pattern that is being used in previous project that i worked on.
I am wondering if we could leverage and integrate with that project (or reuse modules/research already done) which might help us as they are solving/have solved similar problems.

Few observations.

In other words, if 10 users paid an RLN membership for a given superepoch then all this generated fees are distributes
among the nodes operators that participated in this superepoch.

Wondering if pro-rating of rewards would be requried as node operators can join the network at any point in time.

For each superepoch node operators construct a Merkle tree with all the messages they have:

Since in relay network message ordering is not gauranteed, how would this merkle tree look like as the root would keep changing based on order of insertion. Instead of a merkle tree, could we instead use sync state datastructure which is used by RBSR (as it is resilient to ordering)

As explained before, node operators are registered beforehand in the contract a priori so the total amount of votes are known.

this essentially means total votes to be considered for counting and majority would change per superepoch right.

@SionoiS
Copy link

SionoiS commented Sep 10, 2024

  • Hence, one would need to use store sync on a regular basis to sync all messages (instead of using RLN Relay).
    I wonder if purely by having some sane rate limits on store, this can be prevented.

We could not use the hash of the actual message for sync and instead use a different hash that is not used by the consensus mechanism.

@alrevuelta
Copy link
Contributor Author

@fryorcraken

One potential issue is ephemeral messages. They should probably be discarded from the Merkle tree as it is not possible to use store/store sync to double check none were missed

Good catch. Yep, they should be discarded from the Merkle tree.

@chaitanyaprem

Powerloom

Will check, interesting.

Wondering if pro-rating of rewards would be requried as node operators can join the network at any point in time.

Haven't thought much about it, but most likely yes. I'm afraid though this calculations may consume too much gas. Ideally billing periods should go together with the superepoch but if someone registers the last day before the superepoch finishes, what happens? Does the user has to pay for the full period? No. Does the user pay for just 1 day and then the next superepoch(s) will match the billing period? Maybe yes.

Since in relay network message ordering is not gauranteed, how would this merkle tree look like as the root would keep changing based on order of insertion. Instead of a merkle tree, could we instead use sync state datastructure which is used by RBSR (as it is resilient to ordering)

I was thinking about ordering the leafs according to some criteria. Eg by timestamp and by hash (alphabetically). In that case the order would be deterministic and the root as well. You may receive an out of order message that you can insert in between wherever is its place. Does the datastructure you are referring to contain some kind of representation of all the content (eg like merkle root)?

this essentially means total votes to be considered for counting and majority would change per superepoch right.

Yep. New operators can be added (+) or operators can be slashed (-) or leave (-). Slightly related, Ethereum has a queue mechanism so that even if you have lots of capital to deploy (spawn new operators) you can't do it instantly. You have to wait an entry (and exit) queue. Its a protection.

@SionoiS
Copy link

SionoiS commented Sep 10, 2024

I was thinking about ordering the leafs according to some criteria. Eg by timestamp and by hash (alphabetically). In that case the order would be deterministic and the root as well. You may receive an out of order message that you can insert in between wherever is its place.

That's basically what Prolly trees are.

@chaitanyaprem
Copy link

I have been giving it more thought on fee distribution to operators and the approach we are thinking of above. Dropping some more thoughts here, not sure if we need to have solution to start with or some of these should be thought at a later stage but writing them anyway.

Waku's aim is to build a p2p network where users and operators can operate in adversarial conditions (different than other p2p networks like ETH where currently they are currently designed for ideal conditions i.e node always online, not censorship resistent etc). This means that the conditions are not ideal either for users or node operators(i.e network maybe always available but nodes may not be due to various reasons which means it may not be able to function at 100% all the time).

By designing a reward system for operators who validate and relay all the messages that pass through the network, we are expecting all operators to be available 100% of the time relaying messages.
If we take this approach, will we be end up hurting the operators who run the nodes from home where stability is not as guaranteed as nodes running in cloud service provider. Will we be forcing operators away from decentralization as this would push operators to run node on cloud providers only?

Should we consider an approach where operators are rewarded based on percentage of messages they were able to relay? I am not saying we should reward even if someone relays only 1% of messages, but rather than expecting 100% relaying should we look at some approach where incentives are given in other manner. Again this goes against my point above, because by incentivizing in this model we may be pushing operators towards utilizing a more centralized infra.

Alternatively how about a mechanism which randomly verifies if an operator is validating and relaying messages? Maybe similar to a mechanism which filecoin network uses for Proof-of-SpaceTime. In Proof-of-SpaceTime, random checks are done for parts of data that are expected to be stored at frequent intervals which prove that data is still being stored rather than checking for all the data that is stored all the time.
Can we think of a mechanism on similar lines that can just verify that a node is validating and relaying messages by being able to do that with some sort of a part verification rather than verifying if the node has relayed/validated all messages in the network?

@jm-clius
Copy link
Contributor

Thanks for the issue write-up!

I think your "open problems" summarise what our next research focus should be. In particular, a potential no-go would be if we can't find an elegant solution for:

Handle lazy nodes, defined as entities not participating in relay but somehow getting a valid Merkle root.

A node wouldn't even need to use Sync to be able to track the Merkle root without actually contributing - you could simply run as a relay "sink" (with a few in connections and no out) without actually relaying messages or providing other services. I like the direction that @chaitanyaprem suggests - random checks to verify that a node is actually contributing to the network. It may help here to consider Sync as an integral part of the routing fabric of the network. In other words, we may expect operators participating in the fee rewards to run the trifecta of RLN + Relay + Sync - at the very least they would have to respond to Sync requests (as it may be difficult to "prove" that a node is actually relaying and not just storing messages).

Should we consider an approach where operators are rewarded based on percentage of messages they were able to relay?

Presumably, if we consider Sync an essential element here, we could expect of nodes to know about all messages. Perhaps a random check to show that expected messages are stored combined with a Merkle proof?

@alrevuelta
Copy link
Contributor Author

@chaitanyaprem

different than other p2p networks like ETH where currently they are currently designed for ideal conditions i.e node always online, not censorship resistent etc

Note that eth is designed for very adversarial environments and nodes are not expected to be online. The incentives make them stay online, but the network can handle eg 50% nodes offline. Some non-finality period will start and penalties will kick out nodes that are not attesting.

By designing a reward system for operators who validate and relay all the messages that pass through the network, we are expecting all operators to be available 100% of the time relaying messages.

Good point. I would say its not 100% whats required, since you can be 90% up and get what you are missing via store sync (eg restart for an update). But I get the point.

Should we consider an approach where operators are rewarded based on percentage of messages they were able to relay?

I'm open for this, but not sure how it can be technically done. How can you prove that you have relayed "some" messages? I think hopr has something related to this, proof of relay.

Alternatively how about a mechanism which randomly verifies if an operator is validating and relaying messages? Maybe similar to a mechanism which filecoin network uses for Proof-of-SpaceTime

The random checks idea is interesting, will explore.

@jm-clius

A node wouldn't even need to use Sync to be able to track the Merkle root without actually contributing - you could simply run as a relay "sink" (with a few in connections and no out) without actually relaying messages or providing other services

Aren't we protected with gossipsub scoring from this?

I like the direction that @chaitanyaprem suggests - random checks to verify that a node is actually contributing to the network

Sounds interesting. Any idea on which challenge to randomly run? imho the hard thing here is that if the challenge implies "being right on something" that's tricky, since we don't have consensus, hence its not possible to verify it correct.

@chaitanyaprem
Copy link

chaitanyaprem commented Sep 12, 2024

Note that eth is designed for very adversarial environments and nodes are not expected to be online. The incentives make them stay online, but the network can handle eg 50% nodes offline. Some non-finality period will start and penalties will kick out nodes that are not attesting.

right, doesn't that mean validators are expected to be always online i.e 100%?

I'm open for this, but not sure how it can be technically done. How can you prove that you have relayed "some" messages? I think hopr has something related to this, proof of relay.

proof of relay was interesting read, Thanks!

Instead, since gossipsub already has very detailed scoring mechanism that penalizes nodes for non-effective participation when they are online, i was wondering if we can somehow leverage that. Will think more on this problem though, this is just an initial thought.
What if each node somehow indicates based on gossipsub score that a peer was available and relaying messages effectively. And if such indications are received from some n number of nodes for m times in a week/superepoch or some other duration we can think of, can we consider that node was able to do its job of relaying(maybe not 100%, but at some degree - was thinking some sort of thresholds to scores can be used in determining this)?
This also acts as an indirect random check for the node.

Maybe this somehow can also be added to the node's reputation apart from service protocol specific reputation? wdyt @s-tikhomirov

@s-tikhomirov
Copy link
Contributor

s-tikhomirov commented Sep 12, 2024

I think hopr has something related to this, proof of relay.

From what I understand, HOPR is similar to Lightning in that it uses onion routing and hashlocks to make payment conditional on delivery of data. AFAIU, this scheme is suitable for a point-to-point communication, where first a path is established, and then the message is onion-encrypted using public keys of nodes along the path. Our use case, in contrast, has a broadcast communication pattern, therefore I'm not sure proof-of-relay is well-suited here. I'd be happy to be proven wrong of course.

I've also been preparing a more detailed reply to this discussion, will post it as a separate message.

@jm-clius
Copy link
Contributor

if the challenge implies "being right on something" that's tricky, since we don't have consensus

No design in my head, but if we can reach some consensus on the message hash root for a superepoch (using your proposal), I guess a node could potentially be challenged to prove that they're storing random messages in this superepoch with a Merkle proof to the (consensus) root.

Aren't we protected with gossipsub scoring from this?

True. I misstated a problem we had with Relay "leeches" earlier - nodes that are not publicly reachable and only make a few outgoing Relay connections (the number of which they control). They observe gossipsub rules on these connections, but essentially provide nothing to the network since they only consume connections. In other words, they don't enrich overall connectivity by being discoverable and allowing incoming connections. Hence my entertaining the idea of combining this with at least some verification that the nodes provide a public service that contributes to the routing infrastructure (i.e. Sync in this case).

This ties into another difficulty: the overall aim of a resilient relay infrastructure is to disperse the infrastructure as much as possible and increase the number of node operators. Everyone participating in a reward sharing scheme, however, is in essence incentivised to keep the topology as constrained as possible (fewer nodes = more reward). It is possible, as explained above, to be a perfectly good gossipsub node, but artificially limit the connectivity in the network to keep new operators from joining.

@alrevuelta
Copy link
Contributor Author

No design in my head, but if we can reach some consensus on the message hash root for a superepoch (using your proposal), I guess a node could potentially be challenged to prove that they're storing random messages in this superepoch with a Merkle proof to the (consensus) root.

I see. This addendum can fix the "lazy operator" issue, and moreover it prevents people from relying and not storing the messages. But this overlaps relay and store protocols. I mean, this proposal aimed to incentivize relay, not store. So not sure if we are mixing things? Or well, maybe relay and store can go together?

As a side note, another problem is how to act upon the challenge. NodeA challenges NodeB. Is the challenge result stored onchain? (probably not since that will consume to much resources). Or this information affects gossipsub scoring?

@mart1n-xyz
Copy link

This ties into another difficulty: the overall aim of a resilient relay infrastructure is to disperse the infrastructure as much as possible and increase the number of node operators. Everyone participating in a reward sharing scheme, however, is in essence incentivised to keep the topology as constrained as possible (fewer nodes = more reward). It is possible, as explained above, to be a perfectly good gossipsub node, but artificially limit the connectivity in the network to keep new operators from joining.

I think there is more nuance here. As a node, I prefer to have a larger node set as that increases the size of the pie of rewards. However, I want these nodes to be poorly connected, not to be able to vote for the "correct" merkle root and claim rewards.

Some ideas and comments:

  • I assume the proposal gives equal voting power to all nodes. One could also play with this in a way where new nodes initially get lower voting power and it grows towards some limit over time - to promote network resilience. This does not solve the "joins later" issue. However, coupled e.g. with a lower slashing rate that again increases over time (say 75% -> 100%) could limit the financial risk for the new node of getting the initial superepoch proof wrong.
  • One idea of solving the "joins later" issue that comes to my mind is to actually have two merkle roots to vote on. The first being the one described above. The second being constructed as follows:
    • Messages are split into blocks of desirable granularity within a superepoch
    • For each block, a block merkle root is constructed as described above
    • These roots are inputs to yet another merkle tree whose proof is part of the tuple that is voted on.
    • In this scenario, new nodes are allowed not to vote and submit only a single (or multiple) block merkle root that would be verified against the second merkle root and they would be rewarded for the appropriate portion. There's a bunch of caveats here but that's the gist of it.
  • I think we should also bring identity into this discussion. By voting and claiming onchain, a node de facto becomes associated with an address (plus registration can involve registering a public key) and I am wondering if this can leveraged somehow for the "proof of relay" alternative or random checks.

@alrevuelta
Copy link
Contributor Author

@mart1n-xyz Thanks for the comments!

As a node, I prefer to have a larger node set as that increases the size of the pie of rewards

afaik that's not the case. More node operators in thenetwork imply less rewards for each one. Rewards come from users registering RLN memberships which are a different actor than node operators.

Re. Your idea of "sub-merkle-trees".

It can be actually thought as the same tree, but going n levels down from the root. I mean, if you want to divide the superepoch in 2, you get 2 leafs (the child of the root). With superepoch/4 you get the 2 child of the child, etc. So same merkle tree splited in subtrees.

But not sure that the complexity is justified. What's the intention behind? To allow late joiners to participate? This could be fixed with a smaller superepoch size, so that if you miss some part of it, it's not a lot (eg few minutes). And well, store sync also allows late joiners to sync up.

I may have to reconsider the superepoch size, because that's how frequently nodes reach consensus on the root. Right now the epoch is 10 minutes, so a superepoch will be for sure greater than that. And that may be too high.

I think we should also bring identity into this discussion

Indeed. This is very important to for "1. Node registration" and maybe can be reused for other stuff.

@jm-clius
Copy link
Contributor

But this overlaps relay and store protocols.

Indeed. Well, I would say it overlaps Relay and Sync, which seems to reflect the model in which routing can in fact take place via either relay or sync. To me it seems Sync may be a required component in any case for nodes to have a reasonable chance to reach consensus on the Merkle root. That said, I would lean towards a Relay-only solution, but I'm not sure how solvable the lazy node and anti-connectivity problems are if we don't consider Relay and Sync as an integrated routing protocol. Or, at least, this is a possible solution that may give us more angles from which to approach the problem.

@mart1n-xyz
Copy link

@alrevuelta Thanks for explaining. Makes sense, merkle trees are not my forte :)

What's the intention behind? To allow late joiners to participate?

To allow nodes launching within an superepoch to get partial rewards if they are not able to derive the merkle root for the complete set of messages but can prove a subset.

@s-tikhomirov
Copy link
Contributor

s-tikhomirov commented Sep 13, 2024

I started writing down my thoughts on this issue, and while the result may sound a bit theoretical, I hope it will be helpful.

Fundamentals

In the big picture, what are we aiming to achieve?

We want the network to function effectively. We need to incentivize nodes to act in the best interest of the network. At the same time, we want rewards to be distributed fairly.

Metrics

It's important to clearly define the terms we're using. I find it helpful to think in terms of metrics.

Ideal Global Metric

Imagine we could measure anything in the Waku network. What global metric would we aim for? What do we mean by "functioning effectively"?

In simple terms, a functioning network means messages are propagated quickly and reliably. This implies several independent metrics:

  • Minimizing the percentage of lost messages;
  • Maximizing the percentage of nodes that receive a message within T seconds of its first propagation;
  • Minimizing the time it takes for N percent of nodes to receive a message.

For example, consider using "infection" terminology: nodes that receive a new message are "infected." There could be a trade-off between:

  1. Infecting 50% of nodes in 1 second, and another 40% over 9 seconds (total of 10 seconds to infect 90% of nodes);
  2. Infecting 50% of nodes in 3 seconds, and another 40% over 5 seconds (total of 9 seconds to infect 90% of nodes).

Which option is preferable?

Question: What global metric should we optimize?

Local Behavior

The network consists of independent nodes whose behavior we want to incentivize. We need to determine what local behaviors contribute to optimizing our global metric. Informally, for a node to maximize message propagation, it should:

  • Receive new messages quickly from peers;
  • Forward messages to a wide set of its own peers.

This behavior needs a formal definition, as informal rules can be exploited. For example, if we define ideal behavior as "forward messages to as many peers as possible," forwarding to already "infected" peers may not help overall propagation.

Question: What local behavior optimizes the global metric?

Local Metric

We should define a local metric as a proxy for the desired local behavior. This metric must be measurable and ideally provable. The best metric would ensure that the only way for a node to score well is by honestly performing the desired behavior.

Question: What local metric best reflects the ideal local behavior?

Measuring the Local Metric

Finally, we need to figure out how to measure the chosen local metric. If nodes are motivated to increase their score, self-reporting could be unreliable. However, some metrics are only known to the nodes involved (e.g., how many messages they actually relayed).

Question: How do we reliably measure the local metric in practice?

Root Voting Proposal in the Context of This Framework

Based on the framework above, the proposal under discussion seems to be as follows:

  1. Ideal Global Metric: Relay messages quickly and reliably (though this needs a more precise definition).
  2. Ideal Local Behavior: Participate in the Waku Relay protocol as per its specifications.
  3. Local Proxy Metric: Knowledge of the correct root hash.
  4. Measuring the Local Proxy Metric: Nodes vote on the correct root hash using a commit-reveal scheme implemented via a smart contract.

Key assumptions:

  • A single correct root hash exists;
  • The correct root hash can be determined through voting;
  • Knowing the root hash implies knowledge of all underlying messages;
  • Knowing all messages implies that all messages were relayed;
  • Relaying all messages ensures optimal network performance.

We need to examine each assumption to determine how often they hold true. Assumptions rarely hold in every case, so the question is whether they hold often enough for our purposes.

Other Considerations

A few additional points worth exploring:

Incentives for (De)centralization

How important is it to incentivize decentralization? Maximizing performance metrics might favor large, centralized nodes in data centers over smaller, home-run nodes. Do we want to encourage not only performance but also decentralization?

Time Signal and Superepoch Boundaries

The proposal assumes that each message belongs to a single superepoch. Without full consensus, issues could arise if a message is issued exactly at the boundary between epochs, leading to disagreement on which epoch it belongs to and, consequently, on the correct root hash.

Disagreement on the Root Hash

What happens if no candidate root hash secures a supermajority? What should be the protocol in such cases?

Thoughts on Consensus

This section may be somewhat abstract, but I think it's important to mention.

I’m still unclear on the relationship between Waku and consensus. On the one hand, Waku is part of the blockchain ecosystem, where consensus is crucial for ensuring correctness across transactions. In financial systems, users want to ensure no double-spending occurs, requiring a global awareness of all transactions. This leads to the creation of consensus algorithms like PoW or PoS, with strong guarantees.

In contrast, consensus may not be as critical in a messaging network like Waku, where users generally care only about messages sent to them. While synchronization protocols are important, they may not need the same level of finality guarantees as blockchains.

However, by introducing financial incentives, we invite potential exploitation. In blockchains, consensus algorithms prevent cheating around rewards to a large extent (although even then things like selfish mining do exist), but Waku lacks such a robust consensus mechanism. Specifically, the root-hash-based proposal assumes a globally agreed-upon correct root hash without a full consensus mechanism to enforce it. My concern is that attackers may find ways to game the system by exploiting differences between Waku and consensus-based protocols.

Conclusion

This all boils down to two key points:

  1. I would suggest to define more precisely which metric we're trying to optimize and which local behavior to incentivize;
  2. I think it's worth considering to which extent we can make consensus assumptions without a fully-fledged consensus mechanism.

@alrevuelta
Copy link
Contributor Author

@s-tikhomirov Thanks for the comments.

To clarify, imho the waku protocol has different behaviors to incentivize, but not all of them are archived via this root voting suggestion. I see two layers of incentives:

  • low-level-non-economic-incentives: These are rewarded via gossipsub scoring to the peers as some kind of non-economical reputation. It measures the time the peer has been in the mesh, if the peer was the first one to deliver a message, etc. I would say this addresses some of your questions. They are not tied to on-chain crypto-incentives, and this proposal tries to do so.
  • high-level-onchain-economic-incentives: This root voting proposal sits at this level. It maps the low-level incentives to something that can be reflected on-chain, and used to reward node operators. The idea (and assumption) is that if you do ok in the low-level incentives, you will have the root and will be rewarded.

Infecting 50% of nodes in 1 second, and another 40% over 9 seconds (total of 10 seconds to infect 90% of nodes);
Infecting 50% of nodes in 3 seconds, and another 40% over 5 seconds (total of 9 seconds to infect 90% of nodes).
Which option is preferable?
Question: What global metric should we optimize?

I would say this is too low level for the "high-level crypto incentives". This kind of belongs to the gossipsub domain and not exactly, but already dealt with gossipsub scoring.

Key assumptions:

1. A single correct root hash exists;
2. The correct root hash can be determined through voting;
3. Knowing the root hash implies knowledge of all underlying messages;
4. Knowing all messages implies that all messages were relayed;
5. Relaying all messages ensures optimal network performance.

1 yes. 2 yes, assuming 2/3 are honest. 3 not really, but thinking about a challenge proposed in other comments (eg lazy node using store sync building tree and discarding the messages). 4 maybe not, but there are other ways to ensure messages are relayed. 5 yes, but there is more than that.

Maximizing performance metrics might favor large, centralized nodes in data centers over smaller, home-run nodes

It depends on where you set the threshold. If I reward anyone that solves 2+2 and give 5 seconds I'm not favoring anyone, since that's a commodity. Anyone can solve that no matter their resources. The result from supercomputer is worth the same as the one from a RaspPi.

The proposal assumes that each message belongs to a single superepoch. Without full consensus, issues could arise if a message is issued exactly at the boundary between epochs, leading to disagreement on which epoch it belongs to and, consequently, on the correct root hash.

Edge cases should be dealt with deterministically in the implementation but I would say that's not a problem. RLN binds messages to an epoch. And superepochs are just n epochs together.

What happens if no candidate root hash secures a supermajority? What should be the protocol in such cases?

It's one of the open problems. Will update it once I have more. The challenge here is solving it simply. An option could be to allow a second round, where nodes are expected to use store sync under the hood to try to reach consensus. Not simple though.

In contrast, consensus may not be as critical in a messaging network like Waku, where users generally care only about messages sent to them. While synchronization protocols are important, they may not need the same level of finality guarantees as blockchains.

I see consensus as important because of two things:

  • Gives some guarantees that a message is known by multiple nodes and was (with high probability) relayed through the network. This opens interesting use cases like proving that you sent a message, maybe for light clients (light push).
  • Not sure if we can have incentives without consensus. Happy to be proven otherwise.

I would suggest to define more precisely which metric we're trying to optimize and which local behavior to incentivize;

I see this root proposal as more generic than that. The low-level metrics you are referring to can be (and some already are) placed in gossipsub scoring. This proposal just tries to map that scoring to something onchain that we can use to reward operators. I see the root voting as some kind of high-level-summary.

I think it's worth considering to which extent we can make consensus assumptions without a fully-fledged consensus mechanism.

No plan to have a fully-fledged consensus. For that, we have smart contracts with an underlying blockchain infrastructure providing that. imho we need some kind of consensus for messages as a prerequisite for incentives, that ideally can be simple.

@s-tikhomirov
Copy link
Contributor

s-tikhomirov commented Sep 18, 2024

Thanks for the reply @alrevuelta !

The idea (and assumption) is that if you do ok in the low-level incentives, you will have the root and will be rewarded.

Sure, but IMO the interesting question is how to ensure that the inverse statement holds. In other words: one should not be able to claim rewards without doing OK at the lower level (i.e., without actually relaying messages).

assuming 2/3 are honest

By the way, where does the 2/3 number come from? I know it is a common security assumption of BFT and PoS protocols, but as long as we have neither of those, do we have to use 2/3 too? Perhaps we could set the limit to, say, 90%, and that would discourage some splitting attacks? (Again, subject to further investigation if we decide to go along this route.)

I see consensus as important

Not disagreeing with you here, but in my view the key question is "can we afford consensus", not "would it be nice to have it". Of course it would be nice, but to implement consensus properly, blockchain protocol designers are spending lots of R&D effort + considerable resource commitment is required from protocol participants (hashing in PoW, locked-up capital in PoS). My feeling is that such things would be an overkill for a message relay network. We can, however, look for a clever way to piggy-back on existing consensus like we do with the RLN contract already.

This proposal just tries to map that scoring to something onchain that we can use to reward operators.

I now realize that this part wasn't clear to me. Do we assume, among other things, that gossipsub score reflects the node's network participation, and that attackers won't fake their scores (perhaps using multiple colluding nodes that rank each other highly without relaying the messages)?

No plan to have a fully-fledged consensus. For that, we have smart contracts with an underlying blockchain infrastructure providing that. imho we need some kind of consensus for messages

By fully-fledged consensus I mean consensus for messages with strong security guarantees (what alse can the consensus be for)? I agree though that we should leverage existing blockchain infrastructure as much as possible.

@alrevuelta
Copy link
Contributor Author

Sure, but IMO the interesting question is how to ensure that the inverse statement holds. In other words: one should not be able to claim rewards without doing OK at the lower level (i.e., without actually relaying messages).

Totally agree. This is the line of work behind "lazy operator". Largely undefined by now.

By the way, where does the 2/3 number come from?

Open to discussing it. Took it from PoS like systems but ofc tunable.

key question is "can we afford consensus", not "would it be nice to have it"

I would say this question will be answered as the outcome of these discussions. I'm pushing towards having it, since I think its important, but we may have to accept that it is overkill for waku.

We can, however, look for a clever way to piggy-back on existing consensus like we do with the RLN contract already.

Open to ideas here. Can you be more specific? Not sure I see any consensus in RLN. I mean everyone that paid X and called register gets it, which is quite a universal truth (thanks to the underlying consensus of the blockchainwe use)

I now realize that this part wasn't clear to me. Do we assume, among other things, that gossipsub score reflects the node's network participation, and that attackers won't fake their scores (perhaps using multiple colluding nodes that rank each other highly without relaying the messages)?

Note that gossipsub reputation is local to each node, based on what it sees from interacting with other nodes. Meaning that node_a can have different reputation values in different nodes. Not sure how an attacker can fake it score, because the score is something a node sets based on what it sees (performance).

@s-tikhomirov
Copy link
Contributor

We can, however, look for a clever way to piggy-back on existing consensus like we do with the RLN contract already.
Open to ideas here. Can you be more specific?

No concrete ideas yet, just a general thought that blockchain consensus i.e. smart contract provides us with a source of truth (just like the RLN contract is the source of truth about who has the right to relay messages).

Note that gossipsub reputation is local to each node

Sure, I understand that. But I'm not sure I understand how gossipsub scores are used in this proposal. Do we only allow nodes with score higher than X (as reported by... whom?) to vote on the root hash?

@alrevuelta
Copy link
Contributor Author

Sure, I understand that. But I'm not sure I understand how gossipsub scores are used in this proposal. Do we only allow nodes with score higher than X (as reported by... whom?) to vote on the root hash?

I mean, if your node forwards invalid messages (according to our gossipsub validation) or eg you don't send messages to any peers (just receive them) this will lower your score to a point where no nodes would want to connect to you. If this happens, no peers = no messages = no valid merkle root.

Do we only allow nodes with score higher than X (as reported by... whom?) to vote on the root hash?

Anyone can vote on the root hash as long as you have posted some collateral. The right to vote is not connected to gossipsub scoring.

@alrevuelta
Copy link
Contributor Author

Thanks everyone for the feedback.

As part of the conversations, the following challenges were spotted:

  • What happens with "lazy nodes" (nodes that know the root but don't participate in relay)?
  • What happens if there is no supermajority during a super epoch?
  • Isn't the consensus window (aka superepoch) too long?

Proposing a solution for each.

Dealing with lazy nodes

Problem:

  • Lazy nodes knowing the root but not participating in gossipsub (just fetching hashes via store sync)
  • Lazy nodes knowing the messages but not participating in gossipsub (just fetching messages via store sync)

Solution:

  • Modify store-sync to only serve messages with a given delay STORE_SYNC_DELAY_SECONDS. Only serve messages up to current_timestamp - STORE_SYNC_DELAY_SECONDS.
  • Add a "challenge" between peers and include the result in the gossipsub scoring. A peer-to-peer challenge where nodes check the knowledge of given random messages in the [current_timestamp - STORE_SYNC_DELAY_SECONDS, current_timestamp] window.

The whole proposal for distributing RLN fees works under the assumtion that knowing the root implies participating in relay, which is the behaviour we want to incentivize.
However, we can have lazy nodes that don't run relay at all, but manage to get a valid Merkle root via store sync at the expense of others.

There are two variants of lazy nodes:

  • The laziest, which are the ones that just get message hashes to construct the Merkle tree.
  • The ones that fetch and store message content + hashes but using store sync, hence not participating in relay.

To solve this problem we introduce a limit on the messages that store-sync serves. It shall serve messages up to current_timestamp - STORE_SYNC_DELAY_SECONDS, meaning that
only relay nodes will known the messages during the STORE_SYNC_DELAY_SECONDS window.

On the other hand, we introduce a challenge. Its result is included in the gossipsub scoring. More particularly, in the P5 score aka "Application-Specific" score.

The P5 score increases when the challenge is answered correctly and decreases when it is not. It's designed in a way where few ocassional invalid responses don't affect
the score and decays over time to a point where it no longer matters. But when the node fails multiple challenges in a row it lowers its score to a point where a disconnection is triggered.

This challenge is done periodically by every peer with its connected peers, verifying that they know the content of a message given its hash for this STORE_SYNC_DELAY_SECONDS window. Since only nodes running relay will know this, a peer failing to reply to this challenge will be descored.
Note that a PROPAGATION_TIME_SECONDS window shall be added as a safety margin, in order to ensure that the messages are propagated in the network.
Running the challenge before the message is propagated would lead to erroneus results.

The challenge:

  • Every given time interval a peer presents a challenge to a subset of its conntected peers.
  • The peer selects a random message hash from its local database for the current epoch and with a random timestamp between current_timestamp - PROPAGATION_TIME_SECONDS - STORE_SYNC_DELAY_SECONDS and current_timestamp - PROPAGATION_TIME_SECONDS.
  • The peer requests from the remote peer(s) the message content of that hash.
  • If the message content matches the local copy, increase P5 score. Lower otherwise.
  • As an implementation detail, a peer can opt-in to be more aggresive with the challenge if it suspects that the other peer is not storing the messages. The intention behind is to descore and trigger the disconnection as soon as possible.

Since it's in the best interest of every node to keep the network clean of lazy peers, all nodes will report each other and lazy peers will end up without peers to connect to.
The score decrease shall be done in a way where it wouldn't be feasible for a node to swing from vine to vine.

The cost of this solution is:

  • Increased bandwidth: The challenge introduces an extra communication taking neglectable bandwidth in the challenger (message hash) and moderated in the challenged (message content). Since all nodes are challengers and challenged, there is a moderated increase in bandwidth.
  • Limit store-sync: With the STORE_SYNC_DELAY_SECONDS window, store-sync will go a bit behind relay. Note that this does not apply to filter.
  • Couples relay/store: This proposal couples relay and store, since for the challenge to work, a node should offer both relay and store services.

Lack of supermajority

Problem:

  • No consensus is archieved during a voting round.

Solution:

  • Have multiple rounds (in theory infinite) where a new voting for the same epoch is performed until consensus on a root is reached.

The initial proposal didn't take into account the posibility of nodes not reaching consensus during a voting round.
In order to archieve so, we introduce infinite voting rounds, where nodes keep voting until a supermajority is reached.
The following properties should guarantee to some extent that the Merkle root will eventually converge, if not in one round, in a finite number of rounds:

  • Nodes can verify that a waku message is valid (on-chain root and RLN proof).
  • Nodes can sync with each other via store sync.
  • Once the epoch is due no new messages are accepted for that epoch.
  • Peers form a mesh with a low probability of network partitions (discv5 + gossipsub).

With an example.

  • Let node_0 .. node_(n-1) be n node operators registered and participating in consensus with a valid vote.
  • Let CONSENSUS_THRESHOLD be the minimum required % of votes to accept a root as valid. Let's say 2/3.
  • Let roots be an array with the root that each node voted [root_0 ... root_(n-1)]

While no single root gets more that CONSENSUS_THRESHOLD, enter in a new round of voting until consensus is reached.

Note that as an implementation decission, when no consensus is reached in the first round, nodes can be more proactive
to ensure that in the second consensus is archieved. Some strategies:

  • Trigger a more aggresive peer discovery.
  • Temporally allow more connected peers.
  • Trigger a more aggresive store sync.

This should reduce the probabilities of a network partition and the node missing messages.
Nodes should always be well connected and using store sync regardless, but the lack of consensus during a round could
be used to be more aggresive during a short time window.

Some reasons why the network may not reach consensus:

  • Network partition: This could happen at different levels. Note that in some cases, network partitions may be desirable. For example, a group of missbehaving peers shall be kept as isolated as possible.
    • Discovery-level: Preventing peers from seeing other peers.
    • Gossipsub-level: Preventing peers from connecting to other peers.
    • Storesync-level: Preventing peers from syncing with other peers. This can only happen in peers that were offline during the epoch and need to sync some messages.
  • Attack: Similar effect to a network partition but caused by an adversary.
  • Bug: Causing a network partition or other artifacts.

Consensus window

Problem:

  • The initial consensus window was suggested to be for each superepoch, matching the billing period. However this is to long.

Solution:

  • Discard the concept ot superepoch and reach consensus every epoch. The distributed fees for each epoch are proportional to the ones gathered during the billing period.

The initial proposal wanted to introduce the concept of superepoch matching the billing period and with 30 days in mind.
However, reaching consensus and distributing fees every 30 days might be a lot.

In order to reach consensus more often, the superepoch idea is discarded. Consensus is now reached every epoch.
If a billing period contains n amount of epoch then the fees distributed for each would be /n.

Other Open Questions

  • Single entity stakes for 100 operators but only runs 1 node: Unsure if a problem per se. Ethereum blockchain has a similar problem, where 1000000 validators exists (equivalent to our operators) but only 20k beacon nodes exist (our nodes). This can actually be okay, because if a node has lots of stake in it, the incentive for that node to work well is higher.
  • Relation with sharding: To be defined if consensus will be per shard. Most likely yes, since shards are independant and we can't expect nodes to run for all shards.

@jm-clius
Copy link
Contributor

jm-clius commented Oct 2, 2024

Thank you for this thorough analysis and systematically addressing concerns! I have two parts to my response below. One part is to ask some clarifications on your proposed solutions to the open problems, the second is to summarise some further open questions that I think we should consider. I'll try to be as structured as possible. 😅

First, a couple of questions on your proposed solutions:

just fetching messages via store sync

Is this necessarily a bad thing from network behaviour POV? If a node uses only store sync it would still help distribute messages and could in theory participate in services such as filter/store. From the node's POV this would seem a bit silly as RLN Relay would be a cheaper way of tracking the root than relying on store sync.

Modify store-sync to only serve messages with a given delay

Store Sync is designed to improve "real-time" message routing reliability over a time window longer than what gossipsub caches can provide. Since real-time delivery has some jitter, we do already introduce a short delay before messages form part of the sync range. For this proposed mechanism to work, though, I assume we would have to delay with at least one epoch period so that the RLN Relay nodes will be the only ones able to commit the correct root for the finished epoch? My concern would be that this reduces the real-time value of Store Sync or am I missing something with my assumptions?

reach consensus every epoch

IIUC this means that every participating node would have to commit a root for every epoch? How expensive would this be compared to the resulting reward share, even on a cheap L2? For example, even if we max out the message rate in the RLN contract as it stands now (unlikely), the entire contract value would be $8000 over 6 months (currently < $2 per node per month in TWN). This translates to a very tiny amount available for sharing per day and miniscule amounts per epoch. We may hope that the value of memberships and total message rates will increase significantly in future, but it may be a while before RLN generates enough income for reward sharing to be profitable. This is at least partly because RLN was not designed as a mechanism to generate income, but as a rate-limiting mechanism we need to be as cheap as possible to allow max app/client participation (which would make Service Incentivisation more viable as a road to sustainability).

I'll add my more general questions in a next comment.

@jm-clius
Copy link
Contributor

jm-clius commented Oct 2, 2024

I structure this under three meta questions that's less about technical feasibility than about what our long-term vision of Waku should be. Certainly open for discussion here, as Waku's definition and role as infrastructure in the p2p world is up to all of us. These questions are mostly just summaries of what has been discussed before, but I thought it important to gather my thoughts a bit more systematically. I apologise for the length. 😄 Happy to discuss in person or clarify any points.


TL;DR

  1. Is on-chain consensus really needed and compatible with Waku?
  2. Do we really need to incentivise node participation directly? I.e. is the old tit-for-tat and service incentivisation models not enough to ensure network and project sustainability?
  3. Are we incentivising the right thing? I.e. we want to incentivise nodes not only to participate but to enable maximum participation of other nodes as well. We want to incentivise "selflessness" rather than mere participation.

1. Do we really want strong consensus in Waku?

I guess this question covers two points: whether we need on-chain consensus and whether we can afford it. Thus far we've considered the low-latency and ephemeral quality of messaging in Waku as one of our strongest draws - Waku is useful for the type of decentralised signalling use cases that are kept off-chain precisely because it tends to be incompatible with the restrictions of on-chain consensus. Consensus is expensive, at least in terms of complexity and overhead, and the long-term benefits of consensus to Waku as infrastructure not entirely clear to me.

Some possible benefits that consensus may bring (other than RLN fee distribution) and a brief thought about each:

  • Provable message distribution
    Consensus could arguably bring provable "reliable" distribution to Waku. The cost here though is quite high and, again, I'm not sure it adds much value for end users. For example, this wouldn't translate into end-to-end reliability for apps - Waku is just one transport component in the end-to-end app protocol stack and each app would still need to implement reliability protocols as they do now. Consensus will introduce costs for end users and service nodes alike (whether via rising RLN prices, added latency/complexity, less accessible network, etc.). I can see a role for the local, off-chain consensus-like model that Waku Sync introduces which would provide a similar level of reliability with a much simpler model.

  • Provable service provision
    This is actually just another aspect of the previous point. For paid services related to message delivery (Store, Filter, Lightpush) a message distribution consensus mechanism could help the service or client to prove/disprove that a paid service has been provided. I can see this being especially compatible with Lightpush (which pays for distribution); for filter/store I imagine more work would be necessary to prove that queries have been fulfilled by the service node. We would also need to find something beyond message consensus if Waku were to be a general service marketplace for services unrelated to message delivery. In either case, I think some form of reputation (although admittedly also complex) is more likely to be compatible with the often fuzzy requirements we have for "good" services (a subjective combination of reliability, availability, latency and a host of service-specific quality measurements). I know this is an intricate separate discussion, so I'll abbreviate my comments on this point.

2. Do we really need RLN fee distribution?

I think after these analyses we may be in a better position to ask and answer the meta questions again:

  • Do we really need to incentivise RLN Relay node participation?
    We plan to incentivise node participation via Service incentivisation - nodes participate in RLN Relay because it allows them to be good service nodes for paid services. Other nodes participate because they benefit from the network (e.g. they're encapsulating the node in an app). Perhaps this model is enough to encourage participation and grow the network?
  • Do we really need to do something with the RLN fees?
    If we never end up using RLN fees either to incentivise certain network behaviours (such as RLN Relay participation) or fund the project maintenance, would it be fundamentally incompatible with a path to sustainability?

3. What are we really incentivising?

This model assumes that we want to only incentivise node participation in RLN Relay. Perhaps we also want to incentivise selfless participation in RLN Relay - that is, we should encourage nodes to become richly connected for the network to become as diverse and dispersed as possible. It seems to me that nodes will be incentivised to keep the topology as centralised and small as possible. They could limit their connections to absolute minimum to still track the root, refuse new nodes, etc. - there seems to be no mechanism that would prevent this IIUC? GossipSub's tit-for-tat mechanism does a great job of encouraging nodes to be useful while benefitting off the network themselves. A reward distribution would incentivise nodes to try and break this mechanism.

@fryorcraken
Copy link
Contributor

fryorcraken commented Oct 3, 2024

Similar to @jm-clius comments:

  1. I am concerned about potential value for STORE_SYNC_DELAY_SECONDS. Will it be possible to set a value high enough so it has the intended effect for this protocol, and low enough to enable reliability protocol to function with acceptable latency.
  2. I agree with initial thought that a superepoch is too large. However, an epoch seems to be too small. We may need to define and middle ground here appropriate to the on chain cadence and fee.

@alrevuelta
Copy link
Contributor Author

@jm-clius Thanks! Answers to the first comment: Clarifications to the proposal.

Is this necessarily a bad thing from network behaviour POV? If a node uses only store sync it would still help distribute messages and could in theory participate in services such as filter/store

I thought about that while writing the proposal. Good point, maybe it's okay that store-sync nodes can vote for the root. And would simplify the solution because it simplifies the threat model, since these nodes are allowed. No need also to have this STORE_SYNC_DELAY_SECONDS window.

My concern would be that this reduces the real-time value of Store Sync or am I missing something with my assumptions?

It indeed reduces the real-time value of store-sync. But re: previous point maybe we don't need to add this artificial delay if we are ok with pure store-sync nodes voting for the root. So no real-time limitations then.

IIUC this means that every participating node would have to commit a root for every epoch?

Yes. I don't have the gas estimate. It won't be crazy high, but the commit-reveal-claim involves at least 3? (I would say rather cheap) transactions. Regarding the numbers, I would approach it from a different angle. Instead of starting with eg $8000 splitting the cost top-down to what the operator will get, I would do it bottom-up. Meaning starting with the expected gas cost, expected infra cost, expected tx per month, add a % and come up with the expected membership cost to make it sustainable. Will it be prohibitive? Don't have the answer but indeed I need to provide an estimation on the cost of gas for this.

@alrevuelta
Copy link
Contributor Author

alrevuelta commented Oct 3, 2024

@jm-clius Answers to the comment: Technical feasibility.

Do we really want strong consensus in Waku?

You have done a well summary on what consensus can bring, i) rln fee distribution aka incentives for nodes, ii) the ability to prove a wide variety of things (distribution and provision).

I would say we do need it, because of what it unlocks. However, the cost of it may outweigh the benefits and make it non-viable.

Do we really need RLN fee distribution?

I see this question very coupled with the "do we need consensus?".

  • Can we have consensus without incentives?: I would say no, since without some carrot/stick there is no incentive to reach consensus.
  • Can we have incentives without consensus?: Maybe we can, but without any guarantees. In my opinion non-provable-reputation-based-incentives are not enough since a sybil can keep deflecting forever.

And re:

nodes participate in RLN Relay because it allows them to be good service nodes for paid services

Agree on this but how to know what's "good" without consensus? And how to have consensus without some incentives? So IMHO consensus unlocks incentives and vice-versa. And then this unlocks service incentivization because you can prove everything and know whats "good".

By now, RLN fee distribution is the best candidate, since its the only entry point of $ into the system.

What are we really incentivising?

Not sure I agree with this statement.

It seems to me that nodes will be incentivised to keep the topology as centralised and small as possible

Any node (registered operator or not) can introduce a message into the network. And you don't know who is that. If you keep yourself connected to a few nodes, you increase the probability of missing a message, and hence not getting the valid root that will get you rewarded.

I would say what we are incentivizing here is that you i) follow the root and ii) know whats underneath the root (leafs + content) with the challenge. By doing so, we ensure that nodes are up to date, get rewarded for that, and reach consensus unlocking proving anything.

Whether Waku can afford this extra complexity and cost, still not sure myself.

@s-tikhomirov
Copy link
Contributor

A relatively minor idea regarding this:

Increased bandwidth: The challenge introduces an extra communication taking neglectable bandwidth in the challenger (message hash) and moderated in the challenged (message content).

We can implement the challenge-response protocol in such a way that doesn't involve transferring the whole message, for example like this (m is the message, H = h(m) is the message hash):

  1. Challenger to Responder: H, salt
  2. Responder to Challenger: H(m || salt)

The second step only requires transfering one hash instead of one message.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants