Skip to content

Nelson crashes after a few hours/days #51

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
rafaelbrizola opened this issue Jan 19, 2018 · 14 comments
Open

Nelson crashes after a few hours/days #51

rafaelbrizola opened this issue Jan 19, 2018 · 14 comments
Assignees
Labels

Comments

@rafaelbrizola
Copy link

I don't run nelson as a service. I run it using the command:

nohup sudo nelson --config /Users/iota/nelson/config.ini &

Just happened for the 3rd time. I go to check my node and found out that nelson is not running. This last time it was running for almost 24 hours before crashing. More details in the log below.

Expected behaviour

Nelson running continuously

Actual behaviour

After a while, can be hours or a few days, Nelson crashes.

Steps to reproduce

Not sure. It crashes randomly

Basic Info

  • Operating System: Macos High Sierra (10.13.2)
  • Node (npm) Version: v8.4.0
  • IRI Version: 1.4.1.6
  • Nelson version: 0.3.16

Nelson Info

  • Epoch: don't know
  • Cycle: don't know
  • Connected peers: my configuration is 6 in, 5 out

8:30:23 AM.190 16600::NODE new cycle
8:30:23 AM.195 16600::IRI Neighbors removed (if there were any): udp://31.48.190.3:14600
8:30:23 AM.195 16600::NODE connection closed 31.48.190.3:16600 (remotely dropped)
8:30:23 AM.195 16600::IRI Neighbors removed (if there were any): udp://139.226.182.140:14600
8:30:23 AM.195 16600::NODE connection closed 139.226.182.140:16600 (remotely dropped)
8:30:23 AM.196 16600::IRI Neighbors removed (if there were any): udp://31.48.190.3:14600
8:30:23 AM.196 16600::IRI Neighbors removed (if there were any): udp://139.226.182.140:14600
/usr/local/lib/node_modules/nelson.cli/node_modules/ws/lib/WebSocket.js:638
this._req.abort();
^

TypeError: Cannot read property 'abort' of null
at ClientRequest._req.setTimeout (/usr/local/lib/node_modules/nelson.cli/node_modules/ws/lib/WebSocket.js:638:16)
at Object.onceWrapper (events.js:314:30)
at emitNone (events.js:105:13)
at ClientRequest.emit (events.js:207:7)
at Socket.emitTimeout (_http_client.js:722:34)
at Object.onceWrapper (events.js:314:30)
at emitNone (events.js:105:13)
at Socket.emit (events.js:207:7)
at Socket._onTimeout (net.js:398:8)
at ontimeout (timers.js:469:11)

@GoodMirek
Copy link

In my case it crashes several times a day. It's probably a different crash, prints the following error on console:

Error: read ECONNRESET
    at _errnoException (util.js:1024:11)
    at TCP.onread (net.js:615:25)

OS Linux, Nelson v.0.3.16

@romansemko romansemko self-assigned this Jan 23, 2018
@romansemko romansemko added the bug label Jan 23, 2018
@romansemko
Copy link

ECONNRESET is anther bug.

@rafaelbrizola have you closed your mac's lid in between? I think, there is a bug reported in websockets/ws#1280 that is related to this issue.

I will check if this can be fixed/catched somehow...

@rafaelbrizola
Copy link
Author

rafaelbrizola commented Jan 23, 2018

Hi Roman,
Thanks for having a look at this. I configured my macbook to not sleep when closing the lid on power cord.
Of course, something could have happened, like a power outage, and it could have lost that configuration momentarily. I would thought that's unlikely because the error happened consistently, but let's not disconsider that possibility.

@romansemko
Copy link

I will try catching it and, at least, cleaning up neighbors before crashing. Will need more investigation, though..

@lpinca
Copy link

lpinca commented Jan 23, 2018

Let's work together to get the bug fixed upstream. I have a feeling that something isn't working properly in Node.js core though as per this comment.

@romansemko
Copy link

@lpinca , I have upgraded to Node LTS 8.9.4 Let's see if this problem persists.

@rafaelbrizola , could you upgrade your node version to 8.9.4 and try the new Nelson version 0.3.21?

@rafaelbrizola
Copy link
Author

@romansemko Sure. Just did it. Let see how it goes

@rafaelbrizola
Copy link
Author

Hi, I left it running and around 36 hours later I got the error below. Nelson is still running, but doesn't update anymore.

7:03:19 AM.026 16600::NODE new cycle
7:03:19 AM.313 16600::IRI Neighbors removed (if there were any): udp://188.166.3.182:14600
7:03:19 AM.313 16600::NODE connection closed 188.166.3.182:16600 (remotely dropped)
7:03:19 AM.319 16600::IRI Neighbors removed (if there were any): udp://176.9.139.119:14600
7:03:19 AM.319 16600::NODE connection closed 176.9.139.119:16600 (remotely dropped)
7:03:24 AM.037 16600::IRI Neighbors removed (if there were any): tcp://81.169.215.74:15600
7:03:24 AM.037 16600::NODE connection closed 81.169.215.74:16600 (remotely dropped)
7:03:29 AM.045 16600::NODE connecting peer 46.5.17.47:16600
7:03:29 AM.046 16600::NODE connecting peer 104.172.104.192:16600
7:03:29 AM.046 16600::NODE connecting peer 83.135.133.111:16600
7:03:29 AM.344 16600::IRI Neighbors removed (if there were any): udp://83.135.133.111:14600
7:03:29 AM.345 16600::NODE connection closed 83.135.133.111:16600 (remotely dropped)
7:03:29 AM.365 16600::IRI Neighbors removed (if there were any): udp://46.5.17.47:14600
7:03:29 AM.365 16600::NODE connection closed 46.5.17.47:16600 (remotely dropped)
7:03:34 AM.058 16600::IRI Neighbors removed (if there were any): tcp://104.172.104.192:15600
7:03:34 AM.058 16600::NODE connection closed 104.172.104.192:16600 (remotely dropped)
7:03:39 AM.072 16600::NODE connecting peer 23.233.137.139:16600
7:03:39 AM.073 16600::NODE connecting peer 84.229.1.5:16600
7:03:39 AM.073 16600::NODE connecting peer 77.69.198.173:16600
7:03:44 AM.088 16600::IRI Neighbors removed (if there were any): udp://84.229.1.5:14600
7:03:44 AM.088 16600::NODE connection closed 84.229.1.5:16600 (remotely dropped)
7:03:44 AM.089 16600::IRI Neighbors removed (if there were any): udp://23.233.137.139:14256
7:03:44 AM.090 16600::NODE connection closed 23.233.137.139:16600 (remotely dropped)
7:03:44 AM.090 16600::IRI Neighbors removed (if there were any): udp://77.69.198.173:14777
7:03:44 AM.091 16600::NODE connection closed 77.69.198.173:16600 (remotely dropped)
7:03:49 AM.105 16600::NODE connecting peer 106.70.61.176:16600
7:03:49 AM.106 16600::NODE connecting peer 176.100.190.180:16600
7:03:49 AM.106 16600::NODE connecting peer 173.239.214.18:16600
7:03:49 AM.121 16600::IRI Neighbors removed (if there were any): udp://106.70.61.176:14600
7:03:49 AM.122 16600::NODE connection closed 106.70.61.176:16600 (remotely dropped)
7:03:49 AM.123 16600::IRI Neighbors removed (if there were any): udp://176.100.190.180:14600
7:03:49 AM.124 16600::NODE connection closed 176.100.190.180:16600 (remotely dropped)
7:03:49 AM.126 16600::IRI Neighbors removed (if there were any): udp://173.239.214.18:14600
7:03:49 AM.127 16600::NODE connection closed 173.239.214.18:16600 (remotely dropped)
7:03:54 AM.118 16600::NODE terminating...
7:03:54 AM.124 16600::NODE terminating...
7:03:54 AM.125 16600::NODE terminating...
7:03:54 AM.128 16600::IRI Neighbors removed (if there were any): udp://94.130.168.177:14600, udp://45.77.187.45:14600
7:03:54 AM.130 16600::IRI Neighbors removed (if there were any): udp://94.130.168.177:14600, udp://45.77.187.45:14600
Unhandled Rejection at: Promise Promise {
TypeError: Cannot read property 'abort' of null
at ClientRequest._req.setTimeout (/usr/local/lib/node_modules/nelson.cli/node_modules/ws/lib/websocket.js:630:17)
at Object.onceWrapper (events.js:313:30)
at emitNone (events.js:106:13)
at ClientRequest.emit (events.js:208:7)
at Socket.emitTimeout (_http_client.js:708:34)
at Object.onceWrapper (events.js:313:30)
at emitNone (events.js:106:13)
at Socket.emit (events.js:208:7)
at Socket._onTimeout (net.js:407:8)
at ontimeout (timers.js:475:11) } reason: TypeError: Cannot read property 'abort' of null
at ClientRequest._req.setTimeout (/usr/local/lib/node_modules/nelson.cli/node_modules/ws/lib/websocket.js:630:17)
at Object.onceWrapper (events.js:313:30)
at emitNone (events.js:106:13)
at ClientRequest.emit (events.js:208:7)
at Socket.emitTimeout (_http_client.js:708:34)
at Object.onceWrapper (events.js:313:30)
at emitNone (events.js:106:13)
at Socket.emit (events.js:208:7)
at Socket._onTimeout (net.js:407:8)
at ontimeout (timers.js:475:11)
7:03:54 AM.159 16600::IRI Neighbors removed (if there were any): udp://94.130.168.177:14600, udp://45.77.187.45:14600
Unhandled Rejection at: Promise Promise {
TypeError: Cannot read property 'abort' of null
at ClientRequest._req.setTimeout (/usr/local/lib/node_modules/nelson.cli/node_modules/ws/lib/websocket.js:630:17)
at Object.onceWrapper (events.js:313:30)
at emitNone (events.js:106:13)
at ClientRequest.emit (events.js:208:7)
at Socket.emitTimeout (_http_client.js:708:34)
at Object.onceWrapper (events.js:313:30)
at emitNone (events.js:106:13)
at Socket.emit (events.js:208:7)
at Socket._onTimeout (net.js:407:8)
at ontimeout (timers.js:475:11) } reason: TypeError: Cannot read property 'abort' of null
at ClientRequest._req.setTimeout (/usr/local/lib/node_modules/nelson.cli/node_modules/ws/lib/websocket.js:630:17)
at Object.onceWrapper (events.js:313:30)
at emitNone (events.js:106:13)
at ClientRequest.emit (events.js:208:7)
at Socket.emitTimeout (_http_client.js:708:34)
at Object.onceWrapper (events.js:313:30)
at emitNone (events.js:106:13)
at Socket.emit (events.js:208:7)
at Socket._onTimeout (net.js:407:8)
at ontimeout (timers.js:475:11)
Unhandled Rejection at: Promise Promise {
TypeError: Cannot read property 'abort' of null
at ClientRequest._req.setTimeout (/usr/local/lib/node_modules/nelson.cli/node_modules/ws/lib/websocket.js:630:17)
at Object.onceWrapper (events.js:313:30)
at emitNone (events.js:106:13)
at ClientRequest.emit (events.js:208:7)
at Socket.emitTimeout (_http_client.js:708:34)
at Object.onceWrapper (events.js:313:30)
at emitNone (events.js:106:13)
at Socket.emit (events.js:208:7)
at Socket._onTimeout (net.js:407:8)
at ontimeout (timers.js:475:11) } reason: TypeError: Cannot read property 'abort' of null
at ClientRequest._req.setTimeout (/usr/local/lib/node_modules/nelson.cli/node_modules/ws/lib/websocket.js:630:17)
at Object.onceWrapper (events.js:313:30)
at emitNone (events.js:106:13)
at ClientRequest.emit (events.js:208:7)
at Socket.emitTimeout (_http_client.js:708:34)
at Object.onceWrapper (events.js:313:30)
at emitNone (events.js:106:13)
at Socket.emit (events.js:208:7)
at Socket._onTimeout (net.js:407:8)
at ontimeout (timers.js:475:11)

@lpinca
Copy link

lpinca commented Jan 28, 2018

@rafaelbrizola can you try to monkey patch ws and change this to this._req = undefined;?

This will help us understand if the timeout expires after an error or after the upgrade.

@rafaelbrizola
Copy link
Author

Ok, done and restarted nelson. Let's see how it goes.

@hapklaar
Copy link

hapklaar commented Feb 6, 2018

nelson gui also errors out for me after about a day of running. In the top left quarter of the screen, I see the following error multiple times: Unhandled Rejection at: Promise Promise

Seems Nelson is still running though, ,because the bottom half is still updated with neighbor connects and disconnects.

@lpinca
Copy link

lpinca commented Apr 21, 2018

@rafaelbrizola any update?

@rafaelbrizola
Copy link
Author

Hi @lpinca sorry for not getting back to you. After that day, I still run Nelson for a couple of days but didn't notice any other crashes. However, I stopped running the experiment, and for other reason, never got back to that. So, it's inconclusive it that change helped in my case. If nobody else is noticing the crash, you can close this issue. Thanks!

@lpinca
Copy link

lpinca commented Apr 22, 2018

Ok, thanks. I'm pretty sure the issue is not fixed and still persists on ws@4 and Node.js v8.x. Unfortunately I don't have a way to reproduce it, so it's hard to understand what happens and fix it.

This is the tracking issue in ws: websockets/ws#1280

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants