Description
Dear XinFin team,
We run a XDC Mainnet node in house, and around 3 weeks ago we provisioned a new v2.2.4 node on an AWS EC2 r5a.xlarge instance. We had used the same instance type previously and found the performance good enough for our purposes, syncing 1.5 million blocks in 15 minutes.
On this occasion our new node is syncing very slowly - an average of 300 blocks in a minute. It has ~17 peers, and does not seem CPU, disk or memory constrained according to top
, vmstat
or Cloudwatch metrics.
In terms of application logs, the only errors I see are related to the stats endpoint, which seems to be incorrect:
INFO [10-17|10:04:53] Masternodes are ready for the next epoch
INFO [10-17|10:04:53] Imported new chain segment blocks=43 txs=448 mgas=843.869 elapsed=8.370s mgasps=100.819 number=75496950 hash=e9d34a…6d4560 cache=4.47mB
WARN [10-17|10:04:53] Block stats report failed err="write tcp 172.18.0.2:52730->45.82.64.150:3000: use of closed network connection"
WARN [10-17|10:04:53] Post-block transaction stats report failed err="write tcp 172.18.0.2:52730->45.82.64.150:3000: use of closed network connection"
WARN [10-17|10:05:01] Failed to retrieve stats server message err="read tcp 172.18.0.2:51618->45.82.64.150:3000: read: connection reset by peer"
INFO [10-17|10:05:01] Imported new chain segment blocks=45 txs=580 mgas=866.529 elapsed=8.123s mgasps=106.674 number=75496995 hash=e9d2e3…ba225c cache=4.78mB
INFO [10-17|10:05:03] Persisted trie from memory database nodes=1563 size=529.44kB time=19.076817ms gcnodes=6852 gcsize=2.64mB gctime=22.122977ms livenodes=9256 livesize=3.19mB
INFO [10-17|10:05:03] Persisted trie from memory database nodes=0 size=0.00B time=4.64µs gcnodes=0 gcsize=0.00B gctime=77.661µs livenodes=1 livesize=0.00B
INFO [10-17|10:05:03] Persisted trie from memory database nodes=0 size=0.00B time=1.641µs gcnodes=0 gcsize=0.00B gctime=59.762µs livenodes=1 livesize=0.00B
WARN [10-17|10:05:09] Full stats report failed err="write tcp 172.18.0.2:51618->45.82.64.150:3000: use of closed network connection"
INFO [10-17|10:05:09] Imported new chain segment blocks=39 txs=622 mgas=786.045 elapsed=8.029s mgasps=97.895 number=75497034 hash=498c05…10ad90 cache=4.59mB
INFO [10-17|10:05:18] Imported new chain segment blocks=40 txs=533 mgas=805.272 elapsed=8.143s mgasps=98.883 number=75497074 hash=281afa…0c7c5f cache=5.01mB
WARN [10-17|10:05:24] Failed to retrieve stats server message err="read tcp 172.18.0.2:56718->45.82.64.150:3000: read: connection reset by peer"
WARN [10-17|10:05:25] Full stats report failed err="write tcp 172.18.0.2:56718->45.82.64.150:3000: use of closed network connection"
INFO [10-17|10:05:26] Imported new chain segment blocks=44 txs=549 mgas=844.683 elapsed=8.110s mgasps=104.148 number=75497118 hash=f0429a…60e27b cache=4.99mB
INFO [10-17|10:05:28] Persisted trie from memory database nodes=1728 size=572.36kB time=12.981777ms gcnodes=7555 gcsize=2.91mB gctime=24.929708ms livenodes=9151 livesize=3.14mB
INFO [10-17|10:05:28] Persisted trie from memory database nodes=0 size=0.00B time=4.71µs gcnodes=0 gcsize=0.00B gctime=74.78µs livenodes=1 livesize=0.00B
INFO [10-17|10:05:28] Persisted trie from memory database nodes=0 size=0.00B time=2.16µs gcnodes=0 gcsize=0.00B gctime=53.612µs livenodes=1 livesize=0.00B
INFO [10-17|10:05:34] Imported new chain segment blocks=43 txs=554 mgas=845.706 elapsed=8.009s mgasps=105.582 number=75497161 hash=ec557f…9d8b89 cache=4.71mB
INFO [10-17|10:05:42] Imported new chain segment blocks=42 txs=549 mgas=824.981 elapsed=8.094s mgasps=101.919 number=75497203 hash=389103…84bd62 cache=4.74mB
I can reach the stats server with netcat or curl, I'm unsure why the application is getting TCP errors:
ubuntu@node:~$ host stats.xinfin.network
stats.xinfin.network has address 45.82.64.150
ubuntu@node:~$ nc stats.xinfin.network -zv 3000
Connection to stats.xinfin.network 3000 port [tcp/*] succeeded!
Does the node name passed as $INSTANCE_NAME need to be unique?
Are there any configuration options we can change in the node settings, or recommendations that you would make for system configuration to increase performance?
Is there any documentation on troubleshooting this kind of issue out there we should take a look at?