Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reconnect firehose clients after period of inactivity (timeout for recv) #520

Merged
merged 2 commits into from
Jan 23, 2025

Conversation

p1timmy
Copy link
Contributor

@p1timmy p1timmy commented Jan 21, 2025

Fixes #489

Add a timeout parameter to make firehose clients reconnect if no message is received after a given number of seconds instead of waiting forever.

I found 2 ways to set the timeout but not sure which one makes the most sense for most people:

  1. Set the parameter in __init__ to self._timeout and pass that value to client.recv() (a1fcd30)
  2. Use the parameter in start() directly as an argument for client.recv() (c6b0330)

Also allowed the parameter to accept None (the default value for the timeout parameter in client.recv() which causes it to wait indefinitely for the next message) and gave the following default values for each firehose client type:

  • Repos: 30 seconds (because hundreds of ops are being made every second on Bluesky nowadays)
  • Labels: 5 minutes (300 seconds)

I plan to squash all these commits once we come to an agreement with which approach to take and get all the other details sorted out.

@MarshalX
Copy link
Owner

@p1timmy thank you! i ve fixed some issues around and picked constructor as a way to pass timeout; pls review

@MarshalX MarshalX changed the title Force firehose clients to reconnect after period of inactivity Force firehose clients to reconnect after period of inactivity (timeout for recv) Jan 21, 2025
@p1timmy
Copy link
Contributor Author

p1timmy commented Jan 22, 2025

Looks good but why is the parameter renamed to recv_timeout?

Also just realized the timeout error only forces the client to disconnect, the error has to be handled in code separately for the client to reconnect so will need to update docs to reflect on that.

@p1timmy p1timmy changed the title Force firehose clients to reconnect after period of inactivity (timeout for recv) Disconnect firehose clients after period of inactivity (timeout for recv) Jan 22, 2025
@MarshalX
Copy link
Owner

MarshalX commented Jan 22, 2025

why is the parameter renamed to recv_timeout?

because "timeout" is too broad name. it could confuse about timeout for the entire client for any activity. for example timeout could be for the websockets framing, etc. and since we use this only in recv, that's why i renamed it like that

some of the existing timeouts in WebSockets lib that we are using:

  • open_timeout
  • close_timeout
  • ping_timeout
  • ...i guess more

Fixes MarshalX#489

Add a `recv_timeout` parameter to constructor, which is passed to
`recv()` in `start()` to make the client wait for the next message
for a given number of seconds before disconnecting.

The old behavior was to wait forever due to the default timeout
value for `recv()` set to `None`, this caused an annoying bug where
some Bluesky feeds powered by the SDK wouldn't update until the
server is restarted.

Timeout errors only disconnect the client, reconnects have to be done
by the user by catching the error and calling `start()` again.

Currently there's a bug in the bsky.network repos firehose server
that stops sending backfilled messages once it gets to messages from
about 5 minutes ago, you can test this out by setting the cursor
parameter to any seq number from at least 5 minutes ago.

By default, repos firehose client times out after 30 seconds (due to
the fact there's hundreds of ops every second on Bluesky nowadays)
and labels firehose client times out after 5 minutes (300 seconds)

Also removed a todo comment as labels firehose client works perfectly
on my end

Co-authored-by: Ilya (Marshal) <[email protected]>
@p1timmy p1timmy force-pushed the firehose-recv-timeout branch from b43d4d9 to e05f64a Compare January 22, 2025 16:35
@p1timmy p1timmy requested a review from MarshalX January 22, 2025 16:37
@p1timmy
Copy link
Contributor Author

p1timmy commented Jan 22, 2025

because "timeout" is too broad name. it could confuse about timeout for the entire client for any activity. for example timeout could be for the websockets framing, etc. and since we use this only in recv, that's why i renamed it like that

some of the existing timeouts in WebSockets lib that we are using:

  • open_timeout
  • close_timeout
  • ping_timeout
  • ...i guess more

Oh gotcha, makes a lot of sense now

Also I just squashed all the commits in the branch as you picked the approach to use for making the timeouts work. Other than missing docs on how to reconnect to the server due to timeout errors I think this is ready to merge.

@MarshalX
Copy link
Owner

It will reconnect automatically 🧐 that's why I added timeout errors to the errors list of what errors should trigger reconnect mechanism

@p1timmy p1timmy changed the title Disconnect firehose clients after period of inactivity (timeout for recv) Reconnect firehose clients after period of inactivity (timeout for recv) Jan 22, 2025
@p1timmy
Copy link
Contributor Author

p1timmy commented Jan 22, 2025

Oh my bad, that part I didn't pay attention when I reviewed your changes. I just retested my code and the client does reconnect silently after the given timeout, passing None restores the old behavior as expected.

@MarshalX MarshalX merged commit 8b89a24 into MarshalX:main Jan 23, 2025
22 of 23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Firehose client has no receive timeout, causing it to stall and do nothing sometimes
2 participants