Skip to content

Slow performance syncing with xdposchain v2.2.4 #162

Open
@nikki-quant

Description

@nikki-quant

Dear XinFin team,

We run a XDC Mainnet node in house, and around 3 weeks ago we provisioned a new v2.2.4 node on an AWS EC2 r5a.xlarge instance. We had used the same instance type previously and found the performance good enough for our purposes, syncing 1.5 million blocks in 15 minutes.

On this occasion our new node is syncing very slowly - an average of 300 blocks in a minute. It has ~17 peers, and does not seem CPU, disk or memory constrained according to top, vmstat or Cloudwatch metrics.

In terms of application logs, the only errors I see are related to the stats endpoint, which seems to be incorrect:

INFO [10-17|10:04:53] Masternodes are ready for the next epoch
INFO [10-17|10:04:53] Imported new chain segment               blocks=43  txs=448  mgas=843.869 elapsed=8.370s    mgasps=100.819 number=75496950 hash=e9d34a…6d4560 cache=4.47mB
WARN [10-17|10:04:53] Block stats report failed                err="write tcp 172.18.0.2:52730->45.82.64.150:3000: use of closed network connection"
WARN [10-17|10:04:53] Post-block transaction stats report failed err="write tcp 172.18.0.2:52730->45.82.64.150:3000: use of closed network connection"
WARN [10-17|10:05:01] Failed to retrieve stats server message  err="read tcp 172.18.0.2:51618->45.82.64.150:3000: read: connection reset by peer"
INFO [10-17|10:05:01] Imported new chain segment               blocks=45  txs=580  mgas=866.529 elapsed=8.123s    mgasps=106.674 number=75496995 hash=e9d2e3…ba225c cache=4.78mB
INFO [10-17|10:05:03] Persisted trie from memory database      nodes=1563 size=529.44kB time=19.076817ms gcnodes=6852  gcsize=2.64mB gctime=22.122977ms livenodes=9256  livesize=3.19mB
INFO [10-17|10:05:03] Persisted trie from memory database      nodes=0    size=0.00B    time=4.64µs      gcnodes=0     gcsize=0.00B  gctime=77.661µs    livenodes=1     livesize=0.00B
INFO [10-17|10:05:03] Persisted trie from memory database      nodes=0    size=0.00B    time=1.641µs     gcnodes=0     gcsize=0.00B  gctime=59.762µs    livenodes=1     livesize=0.00B
WARN [10-17|10:05:09] Full stats report failed                 err="write tcp 172.18.0.2:51618->45.82.64.150:3000: use of closed network connection"
INFO [10-17|10:05:09] Imported new chain segment               blocks=39  txs=622  mgas=786.045 elapsed=8.029s    mgasps=97.895  number=75497034 hash=498c05…10ad90 cache=4.59mB
INFO [10-17|10:05:18] Imported new chain segment               blocks=40  txs=533  mgas=805.272 elapsed=8.143s    mgasps=98.883  number=75497074 hash=281afa…0c7c5f cache=5.01mB
WARN [10-17|10:05:24] Failed to retrieve stats server message  err="read tcp 172.18.0.2:56718->45.82.64.150:3000: read: connection reset by peer"
WARN [10-17|10:05:25] Full stats report failed                 err="write tcp 172.18.0.2:56718->45.82.64.150:3000: use of closed network connection"
INFO [10-17|10:05:26] Imported new chain segment               blocks=44  txs=549  mgas=844.683 elapsed=8.110s    mgasps=104.148 number=75497118 hash=f0429a…60e27b cache=4.99mB
INFO [10-17|10:05:28] Persisted trie from memory database      nodes=1728 size=572.36kB time=12.981777ms gcnodes=7555  gcsize=2.91mB gctime=24.929708ms livenodes=9151  livesize=3.14mB
INFO [10-17|10:05:28] Persisted trie from memory database      nodes=0    size=0.00B    time=4.71µs      gcnodes=0     gcsize=0.00B  gctime=74.78µs     livenodes=1     livesize=0.00B
INFO [10-17|10:05:28] Persisted trie from memory database      nodes=0    size=0.00B    time=2.16µs      gcnodes=0     gcsize=0.00B  gctime=53.612µs    livenodes=1     livesize=0.00B
INFO [10-17|10:05:34] Imported new chain segment               blocks=43  txs=554  mgas=845.706 elapsed=8.009s    mgasps=105.582 number=75497161 hash=ec557f…9d8b89 cache=4.71mB
INFO [10-17|10:05:42] Imported new chain segment               blocks=42  txs=549  mgas=824.981 elapsed=8.094s    mgasps=101.919 number=75497203 hash=389103…84bd62 cache=4.74mB

I can reach the stats server with netcat or curl, I'm unsure why the application is getting TCP errors:

ubuntu@node:~$ host stats.xinfin.network
stats.xinfin.network has address 45.82.64.150
ubuntu@node:~$ nc stats.xinfin.network -zv 3000
Connection to stats.xinfin.network 3000 port [tcp/*] succeeded!

Does the node name passed as $INSTANCE_NAME need to be unique?

Are there any configuration options we can change in the node settings, or recommendations that you would make for system configuration to increase performance?

Is there any documentation on troubleshooting this kind of issue out there we should take a look at?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions