Skip to content

Introduce broadcast_pool_size option to allow safe pool size migration #197

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

studzien
Copy link
Contributor

@studzien studzien commented Jun 2, 2025

👋
We need to increase the Phoenix.PubSub's pool_size; however, we don't see a way to do this safely (i.e., without losing messages during deployment).

For example, if we change the pool_size: 1 option to pool_size: 2, we will encounter a situation where we'll have nodes with both settings running in the cluster. Then, if a message is broadcast from pool_size: 2, a message can be sent to the shard number 2 via pg. If a node running pool_size: 1 will receive it, it won't be delivered to its subscribers:

image

This draft PR attempts to address this issue by introducing a new option, broadcast_pool_size (that defaults to pool_size if unset). When set, the pool size of shards used for broadcasting messages will be smaller than that used for receiving messages and forwarding them to the local clients.

The pool size change can then be deployed safely in the following two-step process (assuming we're already running our application with pool_size: 1):

  1. We deploy new version with pool_size: 2 and broadcast_pool_size: 1. The new version has two shards participating in pg, but still broadcasts messages using only one shard:
    image
    This way, no messages broadcast from node two will be lost.

  2. We deploy a new version with pool_size: 2. The new version has two shards that can receive and broadcast messages. The version deployed in step 1 can receive messages broadcast by the new version:
    image

  3. When the deployment from step 2 is complete, all nodes are running pools with the new size:
    image

If we need to decrease the pool size, we follow the same process but in the reverse order.

I'm opening the PR to discuss this mechanism; we can work on the exact naming of the parameters and documenting the above process in the documentation when the approach is validated.

@studzien studzien changed the title Introduce broadcast_pool_size option to allow safe pool size migration [Draft] Introduce broadcast_pool_size option to allow safe pool size migration Jun 2, 2025
@studzien studzien marked this pull request as ready for review June 2, 2025 09:29
@josevalim
Copy link
Member

Beautiful work. I am happy with everything here, we only need docs. Perhaps we can even convert those diagrams into mermaid diagrams? (and perhaps AI can automate that).

@studzien
Copy link
Contributor Author

studzien commented Jun 2, 2025

Thanks for taking a look!
Will take a stab at expanding docs with diagrams tomorrow. (I have used excalidraw for a quick visualization of what's going on here).

@studzien studzien changed the title [Draft] Introduce broadcast_pool_size option to allow safe pool size migration Introduce broadcast_pool_size option to allow safe pool size migration Jun 3, 2025
@studzien
Copy link
Contributor Author

studzien commented Jun 3, 2025

@josevalim Here's my attempt to add AI-aided docs with graphs :D
Intuitively, it feels like the mix.exs is not valid place to include the mermaid HTML snippet, though

@josevalim josevalim merged commit 95b4ad2 into phoenixframework:main Jun 3, 2025
2 checks passed
@josevalim
Copy link
Member

💚 💙 💜 💛 ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants