Skip to content

Conversation

@edgurgel
Copy link
Member

@edgurgel edgurgel commented Dec 22, 2025

What kind of change does this PR introduce?

Create new process group library called Beacon that broadcast only group counts to other nodes. Actual pids are only available to the local node. It also supports custom adapter so that we can use PubSub for the broadcasting (including regional broadcasting).

The plan is to start using Beacon without any real changes and later on start using the group data for 3 things:

  • Connected stats
  • Max connected users rate limit
  • Connect & PG Changes check for online users for a tenant

@edgurgel edgurgel changed the title Feat/beacon feat: beacon Dec 22, 2025
@coveralls
Copy link

coveralls commented Dec 22, 2025

Coverage Status

coverage: 88.14% (+0.09%) from 88.051%
when pulling e3d5a7f on feat/beacon
into 05df771 on main.

@blacksmith-sh

This comment has been minimized.

@edgurgel edgurgel force-pushed the feat/beacon branch 2 times, most recently from 36c3250 to e66554d Compare December 23, 2025 02:31
@blacksmith-sh

This comment has been minimized.

@edgurgel edgurgel marked this pull request as ready for review December 23, 2025 04:46
@edgurgel edgurgel requested a review from a team December 23, 2025 17:53
Copy link

@josevalim josevalim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @edgurgel! This is looking great and I have added some comments.

Keep in mind that this could be somewhat implemented with Phoenix.Tracker: you have a custom group and one counter process per node that periodically updates its metadata with the number of entries it sees. This means you can reuse all of PubSub/Tracker, its messaging and updates, while still benefiting from periodical updates. The Tracker approach should be consideraly fewer lines of code, but perhaps there are additional reasons for going ahead with Beacon!

@josevalim
Copy link

PS: I will do another pass on the code after the first around of feedback, just to make sure I didn't miss anything important. Distributed code always deserves a couple rounds of looks!

@edgurgel
Copy link
Member Author

edgurgel commented Dec 25, 2025

Oi, @josevalim ! Feliz natal!

First I would like to say I'm sorry for not providing enough context in this PR description. I will try to fix this now:

Today we use syn groups to essentially know how many WebSockets a tenant has globally. So the aim here globally is many thousands of groups that are constantly changing non-stop.

syn broadcasts "join" and "leave" messages to all nodes for every single WebSocket that joins. This can be very chatty specially because we actually don't care about the actual PIDs. We only care about the group size so if a WebSocket quickly joins and leaves then another one joins and leaves syn will broadcast 4 messages while the count has not really changed in this brief moment.

We decided to build Beacon because we wanted to know how many processes belong to each group globally. We also wanted to keep some features that syn has:

  • Support a subset of nodes to be part of the cluster although this will be more useful in the future as right now it's all global;
  • Be resilient to failures and not cause existing WebSockets to crash.

We also wanted to have some control how messages are sent to the cluster as we wanted to use Phoenix.PubSub with our gen_rpc adapter that does not use erl dist, compresses messages and does "regional broadcasting" (send a message to a node in each region and they broadcast to the rest of the nodes in the region).

It obviously also must be able to quickly answer how many processes belong to a group (locally and globally).

@edgurgel
Copy link
Member Author

Hi @edgurgel! This is looking great and I have added some comments.

Keep in mind that this could be somewhat implemented with Phoenix.Tracker: you have a custom group and one counter process per node that periodically updates its metadata with the number of entries it sees. This means you can reuse all of PubSub/Tracker, its messaging and updates, while still benefiting from periodical updates. The Tracker approach should be consideraly fewer lines of code, but perhaps there are additional reasons for going ahead with Beacon!

I have to admit I didn't even think about checking Phoenix.Tracker. I will definitely check it out next week!

@josevalim
Copy link

I have to admit I didn't even think about checking Phoenix.Tracker. I will definitely check it out next week!

FWIW, I believe Phoenix.Tracker will immediately notify changes to metadata of a given process, so it won't fullfil all of your criteria above. Perhaps a feature could be added to postpone that until the next broadcast though, but I am not sure how trivial or complex that would be.

Copy link

@josevalim josevalim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oi @edgurgel, feliz natal para você também! 🎄

I did another pass, as promised, and I think of all my comments, the only blocker is storing a MapSet in the ETS table. The whole MapSet is copied when read and copied when written, which will make it very expensive if you have thousands of entries. If you need to track the entries, then use two separate ETS tables, one only for the entries.

I also took a look at the Registry to see if we could add the feature directly to it and I don't think we can because the counter requires a number of processes acting as partitions to monitor and manage the counter. Therefore, another option is for Beacon to track only counts, as it needs to use a different design, and keep using the Registry for the entries themselves.

The entries table now holds the actual pids and a separate table holds
the counters

When partition restarts it uses entries table as the source of truth
@edgurgel
Copy link
Member Author

edgurgel commented Dec 28, 2025

Thanks, @josevalim! I went with your advice and now each Partition holds two ETS tables.

Because I'm already "paying" the price of doing a GenServer call I ended up using a set ETS table for entries and just do a lookup before inserting to ensure no duplicates.

If any crash happens the entries table is used as source of truth.

If we for whatever reason crash before the counters table is updated it won't be a problem.

@josevalim
Copy link

@edgurgel I have dropped two last comments in the commit: 8a156f5

Btw, if both tables are set, you will likely be fine with a single table anyway, as long as you namespace the counts. But two tables are also fine. Completely your call.

@edgurgel
Copy link
Member Author

edgurgel commented Dec 30, 2025

Thanks for the review again! I've updated the code based on your comments 🙇

Btw, if both tables are set, you will likely be fine with a single table anyway, as long as you namespace the counts. But two tables are also fine. Completely your call.

Yeah good point. I thought about it but I want the counters table to have the {group, counter} information ready to be fetched to broadcast to other scopes so it's a bit easier to just grab the whole table instead of having to do a select based on the key shape.

Also because I need to do an insert + update_counter there is chance that the counter is not updated and a crash happens.

Then when rebuilding the counters it is easier when I can simply delete all objects from the counters table before rebuilding it.

@josevalim
Copy link

All good to me!

@edgurgel edgurgel requested a review from filipecabaco January 4, 2026 21:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants