Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Websocket: "cannot decode". Error validating utf8 #224

Open
radomir9720 opened this issue Jun 16, 2024 · 8 comments · May be fixed by #230
Open

Websocket: "cannot decode". Error validating utf8 #224

radomir9720 opened this issue Jun 16, 2024 · 8 comments · May be fixed by #230

Comments

@radomir9720
Copy link

radomir9720 commented Jun 16, 2024

My websocket connection is constantly being cut off.

I turned on debug pybit and websocket logs to see what's wrong.
Here is what i've got:

2024-06-15 16:44:16,150 - websocket - DEBUG - Sending ping
2024-06-15 16:44:36,151 - websocket - DEBUG - Sending ping
2024-06-15 16:44:56,152 - websocket - DEBUG - Sending ping
2024-06-15 16:44:56,725 - websocket - ERROR - error from callback <function _WebSocketManager._connect.<locals>.<lambda> at 0x7fc65f4a56c0>: cannot decode: b'{"success":true,"ret_msg":"pong","conn_id":"1a598878-132b-4e\xf17-8864-8cee95ea1b85","req_id":"","op":"ping"}'
2024-06-15 16:44:56,726 - websocket - INFO - tearing down on exception cannot decode: b'{"success":true,"ret_msg":"pong","conn_id":"1a598878-132b-4e\xf17-8864-8cee95ea1b85","req_id":"","op":"ping"}'
2024-06-15 16:44:56,726 - pybit._websocket_stream - DEBUG - WebSocket Unified V5 closed.

I took a look inside websocket package. The exception is thrown from websocket._abnf.py.continuous_frame.extract() method.

if (
    not self.fire_cont_frame
    and data[0] == ABNF.OPCODE_TEXT
    and not self.skip_utf8_validation
    and not validate_utf8(frame.data)
):
    raise WebSocketPayloadException(f"cannot decode: {repr(frame.data)}")

So, there could be several reasons why the exception is thrown:

Disclaimer: i know that if statement is executed until first True, but, since i'm not familiar with implementation details of the websocket package, i started with the part that i can figure out - utf8 validation:

image

As we can see, the validation fails because of the \x escape sequence.

What can we do with it?

  1. Somehow change the connection id, so that it will not contain special characters
  2. Disable utf8 validation. Especially knowing the fact that it will increase the performance. So, passing skip_utf8_validation=True to WebSocketApp.run_forever() method should solve the issue

P.S. I'm using the lates pybit version - 5.7.0

@radomir9720
Copy link
Author

Forked and made changes described above.

Currently testing. If everything's gonna be ok - i will make a PR.

@dextertd
Copy link
Collaborator

I don't think there should be an escape sequence in there, it's just a UUID and should be alphanumeric with hyphens. This makes me wonder if the conn_id sent by the websocket server is the problem. Can you consistently reproduce this issue?

@radomir9720
Copy link
Author

radomir9720 commented Jun 25, 2024

You're right. The issue is not the conn_id. I did some investigations, and, i suppose, the problem is the unstable connection. Perhaps, once in a while bytes get lost, that results in a broken message.

Here are some examples:

  1. Instead of kline.1.FTMUSDT got kline.1.DTMUSDT. An exception was raised because there was no such topic in the topics dict.
2024-06-20 10:34:21,623 - websocket - DEBUG - ++Rcv raw: b'\x81~\x01\x16{"topic":"kline.1.DTMUSDT","data":[{"start":1718868840000,"end":1718868899999,"interval":"1","open":"0.5702","close":"0.57","high":"0.5704","low":"0.5695","volume":"139423","turnover":"79456.8583","confirm":false,"timestamp":1718868861523}],"ts":1718868861523,"type":"snapshot"}'
2024-06-20 10:34:21,624 - websocket - DEBUG - ++Rcv decoded: fin=1 opcode=1 data=b'{"topic":"kline.1.DTMUSDT","data":[{"start":1718868840000,"end":1718868899999,"interval":"1","open":"0.5702","close":"0.57","high":"0.5704","low":"0.5695","volume":"139423","turnover":"79456.8583","confirm":false,"timestamp":1718868861523}],"ts":1718868861523,"type":"snapshot"}'
2024-06-20 10:34:21,624 - websocket - ERROR - error from callback <function _WebSocketManager._connect.<locals>.<lambda> at 0x7f7c2c31b910>: 'kline.1.DTMUSDT'
  1. The entire end parameter is messed up. Also there is a missing quote at the end, that caused a JsonDecodeError.

image

Since we don't have options to check the integrity of the message(for example, through check sum), the only thing we can do - ignore broken messages. But. also, we don't have a universal way to check whether the message is broken or not(for cases like message with topic we didn't subscribe to). The only thing we can do - ignore the exceptions, and keep the connection open. This is what i did. I added an argument disconnect_on_exception. If it's set to False, the exception will only be logged, and the connection will be maintained.

I pushed all the changes to my fork, and gonna make a PR with details what i did, and why.

@radomir9720
Copy link
Author

I'll do it tomorrow

@radomir9720 radomir9720 linked a pull request Jun 26, 2024 that will close this issue
@dextertd
Copy link
Collaborator

I'm not sure how to reproduce this issue, do you think there's a good way to do it for testing purposes?

@radomir9720
Copy link
Author

What do you want to reproduce? A broken message from websocket?
I don't think that there is a good way. I just run the socket stream, and getting broken messages(Therefore, a JSONDecodeError) 1-2 times a day in average. But i'm using my fork, so the connection is maintained.

What is the ultimate goal in reproducing the issue? You can see the logs, that i attached, all you'll get after reproducing the issue - exactly the same logs, and the same result - JSONDecodeError.

@dextertd
Copy link
Collaborator

I've never had this issue myself and never seen it reported, so I'd like to be able to test the fix before committing to master.

I have used pybit to run tests for days at a time but have never seen this problem. So I just want to be able to reproduce it before approving the PR.

@radomir9720
Copy link
Author

This PR, actually, is not a fix. It just provides a way to handle the error in _on_error callback, instead of automatically raising an exception and closing the connection. To do so, it's necessary to pass skip_utf8_validation=True. Also, the exceptions in _on_error can be ignored by passing disconnect_on_exception=False.

So changes are backward compatible, because by default the behaviour in the PR is completely the same as in current version.

If you want to test the behaviour when a broken message is received from websocket, you can, for example, override/mock/change the _on_message method, and raise any exception in it, because exactly message = json.loads(message) from _on_message method throws an exception when the broken message can't be decoded.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants