bug: node won't start with RLN in on-chain dynamic mode #2662

romanzac · 2024-05-03T05:58:18Z

Problem

During "test_publish_with_valid_payloads_dynamic_at_slow_rate" execution, first container running Node1 won't start even within the timeout of 10 minutes. Static mode is working fine. Please have a look.

I've built docker image from release: wakunode2-v0.27.0

Impact

High occurrence, medium severity. RLN in dynamic mode not functional.

To reproduce

Please checkout waku-org/waku-interop-tests@73d0951
cd waku-interop-tests
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
pre-commit install
pytest tests/relay/test_rln.py -k 'test_publish_with_valid_payloads_dynamic_at_slow_rate'

Expected behavior

RLN Relay in on-chain mode working.

Screenshots/logs

node1_2024-05-03_12-45-38__5a5531e5-c304-462b-a7f3-f58ba92d0a0b__harbor.status.im_wakuorg_nwaku:latest.log
node1_2024-05-03_12-45-38__c9737de4-6df4-4f4c-b308-323968164308__harbor.status.im_wakuorg_nwaku:latest.log
node1_2024-05-03_12-46-01__96df9f86-46da-4444-8b3a-7b6d0e7e060a__harbor.status.im_wakuorg_nwaku:latest.log
node2_2024-05-03_12-45-38__c9737de4-6df4-4f4c-b308-323968164308__harbor.status.im_wakuorg_nwaku:latest.log
test_run.log

The text was updated successfully, but these errors were encountered:

rymnc · 2024-05-03T08:08:50Z

the second log you shared seems to show the node has finished syncing - does the /health endpoint still not return true?

rymnc · 2024-05-03T08:09:47Z

the fourth log shows the node has not finished syncing

rymnc · 2024-05-03T08:16:39Z

on my own node -

seems to be the health endpoint returning not ready and healthy in a flaky manner. cc: @NagyZoltanPeter

rymnc · 2024-05-03T08:19:04Z

I see in your logs that we use debug/v1/info in the test - which results in

Response status code: 200. Response content: b'{"listenAddresses":["/ip4/172.18.205.233/tcp/6886/p2p/16Uiu2HAmGNtM2rQ8abySFNhqPDFY4cmfAEpfo9Z9fD3NekoFR2ip","/ip4/172.18.205.233/tcp/6887/ws/p2p/16Uiu2HAmGNtM2rQ8abySFNhqPDFY4cmfAEpfo9Z9fD3NekoFR2ip"],"enrUri":"enr:-LO4QGGlww8liwBmcHFHdLcXwt-Uq0c6iU6cdDJ6pWlh2avnWILMdWa9P_iCS0kiWhLuECjRTMvxoykPXyP5sKjcx88BgmlkgnY0gmlwhKwSzemKbXVsdGlhZGRyc4wACgSsEs3pBhrn3QOCcnOFAAABAACJc2VjcDI1NmsxoQM3Tqpf5eFn4Jztm4gB0Y0JVSJyxyZsW8QR-QU5DZb-PYN0Y3CCGuaDdWRwghrohXdha3UyAQ"}'
INFO     src.node.waku_node:waku_node.py:193 REST service is ready !!

does node1 not respond after this is available?

rymnc · 2024-05-03T08:26:53Z

testing with #2664

NagyZoltanPeter · 2024-05-03T08:32:42Z

on my own node - seems to be the health endpoint returning not ready and healthy in a flaky manner. cc: @NagyZoltanPeter

It's a known issue, already solved here: #2612
Will be part of next release!

rymnc · 2024-05-03T08:34:34Z

awesome, i might be missing something, but does the pr address the node healthy and not ready flakiness?

rymnc · 2024-05-03T08:44:32Z

suggest to use this image - quay.io/wakuorg/nwaku-pr:2664-rln-v1

NagyZoltanPeter · 2024-05-03T08:45:11Z

awesome, i might be missing something, but does the pr address the node healthy and not ready flakiness?

Indeed! It separates node ready from initialization in general from status of rln_relay. There is a - future improved - array of protocol statuses, in there you can see rln_relay status now (and yet there is changing from ready to synchronize and back as of actual status).
Likewise:

{
    "nodeHealth": "Ready",
    "protocolsHealth": [
     {
         "Rln Relay": "Ready"
      }
    ]
}

rymnc · 2024-05-03T08:47:54Z

awesome, i might be missing something, but does the pr address the node healthy and not ready flakiness?

Indeed! It separates node ready from initialization in general from status of rln_relay. There is a - future improved - array of protocol statuses, in there you can see rln_relay status now (and yet there is changing from ready to synchronize and back as of actual status).
Likewise:
{
    "nodeHealth": "Ready",
    "protocolsHealth": [
     {
         "Rln Relay": "Ready"
      }
    ]
}

wow, nice 🔥

romanzac · 2024-05-03T10:44:25Z

Good progress on health info indeed! I'll wait for the next release to enable onchain tests. And I will try to play with /health endpoint on Monday. Thanks for now.

NagyZoltanPeter · 2024-05-07T07:20:36Z

@rymnc: Hi, about the original issue. From the logs I don't see the actual problem. The first log tells me that node is started well. It took almost 7 min to sync on-chain RLN blocks. The second node (or log) ends with still in sync, but only 3 min from startup.
Maybe the start sequence needs to be checked as it seems to me that nodes started with delay while the timeout applies from the first node start, from the 10 min timeout I think.
Can you please check this scenario. Maybe you need to extend the timeout to give enough time for the second node to get in sync.

romanzac · 2024-05-08T02:44:31Z

Adding logs here from my yesterdays testing for PR2664 I have also started to work on integrating node health check for interop tests waku-org/waku-interop-tests#35 . Hopefully we can have next nwaku release and matching interop tests for RLN matured at the same time. Say no to QA lag! :)

test.log

node1_2024-05-07_21-59-06__7729a733-8a72-4309-8480-58237116b75d__quay.io_wakuorg_nwaku-pr:2664-rln-v1.log

node1_2024-05-07_21-59-06__2a690f5f-9983-4913-91a0-ddfdd21352ac__quay.io_wakuorg_nwaku-pr:2664-rln-v1.log

node1_2024-05-07_21-59-07__d4850f44-b5e1-4704-aa2b-83486e800626__quay.io_wakuorg_nwaku-pr:2664-rln-v1.log

node2_2024-05-07_21-59-06__7729a733-8a72-4309-8480-58237116b75d__quay.io_wakuorg_nwaku-pr:2664-rln-v1.log

romanzac · 2024-07-29T11:00:17Z

Hi @rymnc , I've retested on-chain mode and node startup with RLNv2:

I've tried two scenarios:

No RLN credential files present when the test starts - Node with RLN starts at average of 52 sec. which is solid improvement.

Logs:
test.log
node1_2024-07-29_17-52-46__4707c900-c283-4ee8-ba48-b42c351efbb4__harbor.status.im_wakuorg_nwaku:latest.log
node1_2024-07-29_17-52-46__d4c701df-cfb7-4281-9c67-0e297b744520__harbor.status.im_wakuorg_nwaku:latest.log
node1_2024-07-29_17-53-01__2fb144ad-1a02-4348-a6c2-0a11f3e76788__harbor.status.im_wakuorg_nwaku:latest.log
node2_2024-07-29_17-52-46__4707c900-c283-4ee8-ba48-b42c351efbb4__harbor.status.im_wakuorg_nwaku:latest.log

Existing RLN credential files present, new credentials data are appended to the end of keystore.json - Node cannot start.

ERR 2024-07-29 09:57:17.614+00:00 Mounting protocols failed                  tid=1 file=node_factory.nim:456 error="failed to mount waku RLN relay protocol: failed to mount WakuRlnRelay: could not initialize the group manager: failed to get the keystore credentials: Error while searching credentials in keystore: credential not found: Credential not found in keystore"
ERR 2024-07-29 09:57:17.614+00:00 Failed setting up node                     topics="wakunode main" tid=1 file=waku.nim:144 error="Mounting protocols failed: failed to mount waku RLN relay protocol: failed to mount WakuRlnRelay: could not initialize the group manager: failed to get the keystore credentials: Error while searching credentials in keystore: credential not found: Credential not found in keystore"
ERR 2024-07-29 09:57:17.614+00:00 Waku initialization failed                 topics="wakunode main" tid=1 file=wakunode2.nim:68 error="Failed setting up node: Mounting protocols failed: failed to mount waku RLN relay protocol: failed to mount WakuRlnRelay: could not initialize the group manager: failed to get the keystore credentials: Error while searching credentials in keystore: credential not found: Credential not found in keystore"

Logs:
test.log
node1_2024-07-29_17-56-26__8b250a23-d437-4b17-8a74-ca245ae62527__harbor.status.im_wakuorg_nwaku:latest.log
node1_2024-07-29_17-56-26__c8445c39-15b7-40e7-9412-a670da0b4ed6__harbor.status.im_wakuorg_nwaku:latest.log
node1_2024-07-29_17-56-27__4e5e36bb-d051-4bb6-bbe6-26a69d306ce5__harbor.status.im_wakuorg_nwaku:latest.log
node2_2024-07-29_17-56-26__8b250a23-d437-4b17-8a74-ca245ae62527__harbor.status.im_wakuorg_nwaku:latest.log

Any hints ?

rymnc · 2024-07-29T11:02:23Z

hmm, thanks for re-running @romanzac!

may i quickly ask if scenario 1 is working?

as for scenario 2, can you send the keystore with which you have this problem please?

romanzac · 2024-07-29T12:41:08Z

hmm, thanks for re-running @romanzac!

may i quickly ask if scenario 1 is working?

as for scenario 2, can you send the keystore with which you have this problem please?

Yes, scenario 1 works. Keystore sent over .

romanzac · 2024-07-29T14:03:19Z

It looks scenario 2. is somewhat unusual for standard operation. So I will change my tests to either reuse the existing registration or create unique keystores to avoid parsing log for assigned index value e.g. membershipIndex=4180.

romanzac added the bug Something isn't working label May 3, 2024

romanzac assigned rymnc May 3, 2024

romanzac mentioned this issue May 6, 2024

chore: RLN relay tests waku-org/waku-interop-tests#30

Merged

gabrielmer added the effort/days Estimated to be completed in a few days, less than a week label Jul 24, 2024

romanzac closed this as completed Jul 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: node won't start with RLN in on-chain dynamic mode #2662

bug: node won't start with RLN in on-chain dynamic mode #2662

romanzac commented May 3, 2024

rymnc commented May 3, 2024

rymnc commented May 3, 2024

rymnc commented May 3, 2024 •

edited

Loading

rymnc commented May 3, 2024 •

edited

Loading

rymnc commented May 3, 2024

NagyZoltanPeter commented May 3, 2024

rymnc commented May 3, 2024

rymnc commented May 3, 2024

NagyZoltanPeter commented May 3, 2024

rymnc commented May 3, 2024

romanzac commented May 3, 2024

NagyZoltanPeter commented May 7, 2024

romanzac commented May 8, 2024

romanzac commented Jul 29, 2024

rymnc commented Jul 29, 2024

romanzac commented Jul 29, 2024

romanzac commented Jul 29, 2024 •

edited

Loading

bug: node won't start with RLN in on-chain dynamic mode #2662

bug: node won't start with RLN in on-chain dynamic mode #2662

Comments

romanzac commented May 3, 2024

Problem

Impact

To reproduce

Expected behavior

Screenshots/logs

rymnc commented May 3, 2024

rymnc commented May 3, 2024

rymnc commented May 3, 2024 • edited Loading

rymnc commented May 3, 2024 • edited Loading

rymnc commented May 3, 2024

NagyZoltanPeter commented May 3, 2024

rymnc commented May 3, 2024

rymnc commented May 3, 2024

NagyZoltanPeter commented May 3, 2024

rymnc commented May 3, 2024

romanzac commented May 3, 2024

NagyZoltanPeter commented May 7, 2024

romanzac commented May 8, 2024

romanzac commented Jul 29, 2024

rymnc commented Jul 29, 2024

romanzac commented Jul 29, 2024

romanzac commented Jul 29, 2024 • edited Loading

rymnc commented May 3, 2024 •

edited

Loading

rymnc commented May 3, 2024 •

edited

Loading

romanzac commented Jul 29, 2024 •

edited

Loading