Separate Registry pool size configuration #198

studzien · 2025-06-20T15:55:29Z

Under our workload (a few thousand topics, single digit to a few thousand subscribers each, a few thousand broadcasts per second from a single node in the cluster), we have started to experience instability of the PG2Worker processes (arrival rate > processing rate => message queue growing indefinitely).

We initially approached it by increasing the pool size, but this didn't help much.
After some investigation, we found that introducing more parallelism in processing incoming messages didn't help much because the PG2 worker processes did more work than before.

That's because Registry for :duplicate keys is sharded by pids, not by key (topic). So, to get all the subscribers, we need to go through all the partitions and get the topic subscribers from each of them.

Moreover, since the dispatch code is executed in the context of the worker process, the serialization cache is valid for just a single dispatch call.
Even if we fastlane serialization, we might execute it up to the number of partitions.

This PR allows configuring the Registry pool size to a value different from the PubSub pool size.
Under our workload, I got the best results with removing the Registry pool entirely, i.e., setting registry_pool_size: 1.

I'm happy to describe the new option in the documentation, but I'm wondering if there was a particular reason why the :duplicate registries are sharded by pids, not by keys. Perhaps that was done to remove contention on the ETS table for topics with many subscribers? Depending on the use case, I can see that picking a sharding method could be a Registry configuration option.

josevalim · 2025-06-20T17:22:44Z

Perhaps that was done to remove contention on the ETS table for topics with many subscribers? Depending on the use case, I can see that picking a sharding method could be a Registry configuration option.

I believe it was to deal with the cases of very uneven distributions (also known as the Justin Bieber effect), where one topic could have million of entries more than others.

I think we can go ahead with this, perhaps call the option registry_size? Also, do you believe the previous PR is still relevant/necessary?

studzien · 2025-06-20T17:33:02Z

I believe it was to deal with the cases of very uneven distributions (also known as the Justin Bieber effect), where one topic could have million of entries more than others.

Got it, I think it makes perfect sense when the Registry dispatch is executed in parallel (so we should get a shorter delivery time when we have a lot of entries).

I think we can go ahead with this, perhaps call the option registry_size? Also, do you believe the previous PR is still relevant/necessary?

Cool, will rename.
Yeah, the previous PR is orthogonal to this. We still want to increase the pool size without losing messages in the cluster. And the Registry pool size doesn't really matter when delivering messages in the cluster since it's local (and there's no mapping between the PubSub shard and the Registry shard).

studzien · 2025-06-21T07:50:21Z

Hi again @josevalim.
I just renamed the option, added another test, and described it in the docs. Please feel free to adjust to what you prefer.
I'm not happy with how the new tests look (as they rely on implementation details), but I could not figure out how to make them better.

Do you think it would be reasonable to add an option to Registry for :duplicate keys, such as :partition_by = :keys | :pids (defaulting to :pids in order not to break existing assumptions) so that this can be tuned on the registry side as well?

josevalim · 2025-06-21T09:50:42Z

Do you think it would be reasonable to add an option to Registry for :duplicate keys, such as :partition_by = :keys | :pids (defaulting to :pids in order not to break existing assumptions) so that this can be tuned on the registry side as well?

At first I don't see an issue with that, we can give it a try. :)

josevalim · 2025-06-21T09:51:26Z

💚 💙 💜 💛 ❤️

studzien · 2025-06-23T09:30:28Z

At first I don't see an issue with that, we can give it a try. :)

Cool, I can prepare something the week after next.
I should also be able to see if there are any benefits, on topics with up to a few dozen thousand subscribers, though, not millions.

Allow separate Registry pool size configuration

272b75a

SteffenDE requested a review from josevalim June 20, 2025 16:38

Rename to rgistry_size; add documentation

e64c5d9

josevalim merged commit fff23f8 into phoenixframework:main Jun 21, 2025
2 checks passed

studzien added a commit to Whatnot-Inc/phoenix_pubsub that referenced this pull request Jun 23, 2025

Separate Registry pool size configuration (phoenixframework#198)

c2e78d2

studzien mentioned this pull request Jun 23, 2025

Separate Registry pool size configuration (#198) Whatnot-Inc/phoenix_pubsub#14

Merged

studzien added a commit to Whatnot-Inc/phoenix_pubsub that referenced this pull request Jun 23, 2025

Separate Registry pool size configuration (phoenixframework#198) (#14)

f4590ac

studzien mentioned this pull request Jul 16, 2025

Add Registry partition_by option for duplicate registries elixir-lang/elixir#14654

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Separate Registry pool size configuration #198

Separate Registry pool size configuration #198

Uh oh!

studzien commented Jun 20, 2025

Uh oh!

josevalim commented Jun 20, 2025

Uh oh!

studzien commented Jun 20, 2025

Uh oh!

studzien commented Jun 21, 2025

Uh oh!

josevalim commented Jun 21, 2025

Uh oh!

Uh oh!

josevalim commented Jun 21, 2025

Uh oh!

studzien commented Jun 23, 2025

Uh oh!

Uh oh!

Separate Registry pool size configuration #198

Separate Registry pool size configuration #198

Uh oh!

Conversation

studzien commented Jun 20, 2025

Uh oh!

josevalim commented Jun 20, 2025

Uh oh!

studzien commented Jun 20, 2025

Uh oh!

studzien commented Jun 21, 2025

Uh oh!

josevalim commented Jun 21, 2025

Uh oh!

Uh oh!

josevalim commented Jun 21, 2025

Uh oh!

studzien commented Jun 23, 2025

Uh oh!

Uh oh!