perf: use a constant number of ranch instances by v0idpwn · Pull Request #706 · supabase/supavisor

v0idpwn · 2025-07-20T17:59:17Z

Instead of starting one ranch instance per pool, use the same N ranch instances for all pools.

Each ranch was starting a minimum of 10 acceptors (and consequently, 10 connection supervisors). With 10_000 pools, these are 200_000 processes, which consume a sizeable amount of memory and resources. They also added complexity in managing the separate ranch instances (need to start and finish them at appropriate times, specially because they weren't linked to the pool).

The ranch acceptors/connection supervisors aren't a bottleneck, and if they were, we can control the number of acceptors/supervisors through configuration. In synthesis, there's no benefit in starting bringing multiple ranch instances, only the resource consumption drawback.

Instead of starting one ranch instance per pool, use the same two ranch pools for the whole application lifecycle. Each ranch was starting the minimum of 10 acceptors (and consequently, 10 connection supervisors). With 10_000 pools, these are 200_000 processes, which consume a sizeable amount of memory and resources. They also added complexity in managing the separate ranch instances (need to start and finish them at appropriate times, specially because they weren't linked to the pool). The ranch acceptors/connection supervisors aren't a bottleneck, and if they were, we can control the number of acceptors/supervisors through configuration. In synthesis, there's no benefit in starting bringing multiple ranch instances, only the resource consumption drawback.

config/runtime.exs

abc3 · 2025-07-21T08:50:08Z

Yeah, a good initiative. Just adding some context for why it was originally done this way.

The initial approach aimed to provide tenant-level network isolation and reduce the risk of cross-tenant access through internal proxying. It also allowed us to observe TCP proxying behavior under these conditions.

We considered both unique ports per tenant and shared ones. For the shared setup, the idea was to maintain a pool of Ranch instances and select one via something like :erlang.phash2(tenant_id, ...) to avoid a single point of failure. We started with unique ports and planned to revisit these options later.

Note that with a single port the number of concurrent connections from a single client machine is limited by the number of available local ports on the client, which is typically around 64k.

Creating Ranch instances is resource-intensive mainly during initialization. When idle, they add negligible overhead.

Here are load charts with 10k local Ranch instances. Scheduler usage is about the same as without them, and memory stays around 1GB

v0idpwn · 2025-07-21T13:26:21Z

Note that with a single port the number of concurrent connections from a single client machine is limited by the number of available local ports on the client, which is typically around 64k.

That's a great point! Will add sharding in the PR. (edit: done on 43da5f3)

Creating Ranch instances is resource-intensive mainly during initialization. When idle, they add negligible overhead.

The memory burden is considerable. In a sample production node, we have just 4700 pools, and :ranch_conns_sup and ranch_acceptor alone take over 1gb of ram in process memory (with 47000 acceptors and 47000 ranch_conns_sup). That's over 1/3 of the total ram consumption of the node. Another option would be to reduce the acceptor count, but specially since "regular" connections use shared ranch instances, I generally think that these connections should too.

lib/supavisor/application.ex

v0idpwn · 2025-07-21T14:38:47Z

I went with a default of 4 shards per mode, which is probably enough for most workloads. If we need, we can increase it in production.

Instead of using single Ranch listeners for session and transaction modes, create configurable shards per mode to distribute connections across multiple ports. This prevents hitting the ~65k connection limit per port, without needing to maintain one ranch instance per pool. Additionally, remove the unused local_proxy_multiplier configuration

config/runtime.exs

lib/supavisor/application.ex

abc3 · 2025-07-22T08:05:25Z

lib/supavisor/application.ex

           ) do
        {:ok, _pid} ->
-          Logger.notice("Proxy started #{mode} on port #{port}")
+          Logger.notice("Proxy started #{opts.mode}(local=#{opts.local}) on port #{port}")


Here we need to get a value from ranch by id, because the port is 0. Although, it might be better to start the range from a defined value to make this behavior more deterministic

Suggested change

Logger.notice("Proxy started #{opts.mode}(local=#{opts.local}) on port #{port}")

port = :ranch.get_port(key)

Logger.notice("Proxy started #{opts.mode}(local=#{opts.local}) on port #{port}")

Added the port!

I'd argue that starting on a range may be "less deterministic". Here's the behaviour is more predictable: we listen on 0, and the OS gives us a port that will always be free. If we try to determine it ourselves, we might get conflicts, etc.

we can choose something that we are sure will be free, for example, starting at 45_000

and we will always know that these ports are used by local proxies

There's no such thing as surely free ports, specially in the ephemeral range :P

well, there are a bunch of strict "safe" examples that supavisor uses: 4000, session's, transaction's, etc 😜

But these aren't in the ephemeral range...

We could maybe pick something outside it though (like 12_000 or smth) 🤔

### Features - **Authentication cleartext password support** - Added support for cleartext password authentication method (#707) - **Runtime-configurable connection retries** - Support for runtime configuration of connection retries and infinite retries (#705) - **Enhanced health checks** - Check database and eRPC capabilities during health check operations (#691) - **More consistency with postgres on auth errors** - Improves errors in some client libraries (#711) ### Performance Improvements - **Optimized ranch usage** - Supavisor now uses a constant number of ranch instances for improved performance and resource management when hosting a large number of pools (#706) ### Monitoring - **New OS memory metrics** - gives a more accurate picture of memory usage (#704) - **Add a promex plugin for cluster metrics** - for tracking latency and connection status (#690) - **Client connection lifetime metrics** - adds a metric about how long each connection is connected for (#688) - **Process monitoring** - Log when large process heaps and long message queues (#689) ### Bug Fixes - **Client handler query cancellation** - Fixed handling of `:cancel_query` when state is `:idle` (#692) ### Migration Notes - Instances running a small number of pools may see an increase in memory usage. This can be mitigated by changing the ranch shard or the acceptor counts. - If using any of the new used ports, may need to change the defaults - Review monitoring dashboards and include new metrics

v0idpwn requested a review from a team as a code owner July 20, 2025 17:59

v0idpwn commented Jul 20, 2025

View reviewed changes

config/runtime.exs Outdated Show resolved Hide resolved

Remove dead code

f2e0f12

filipecabaco approved these changes Jul 20, 2025

View reviewed changes

v0idpwn changed the title ~~perf: constant number of ranch instances~~ perf: use a constant number of ranch instances Jul 20, 2025

v0idpwn force-pushed the feat/ranch-instances branch from 3d9a29a to 85c50e8 Compare July 21, 2025 14:34

github-advanced-security bot found potential problems Jul 21, 2025

View reviewed changes

lib/supavisor/application.ex Fixed Show fixed Hide fixed

lib/supavisor/application.ex Fixed Show fixed Hide fixed

v0idpwn force-pushed the feat/ranch-instances branch from 367f024 to 43da5f3 Compare July 21, 2025 14:41

v0idpwn and others added 2 commits July 21, 2025 11:41

whitespace

ea93809

Merge branch 'main' into feat/ranch-instances

c42919e

abc3 reviewed Jul 21, 2025

View reviewed changes

config/runtime.exs Outdated Show resolved Hide resolved

abc3 reviewed Jul 21, 2025

View reviewed changes

lib/supavisor/application.ex Outdated Show resolved Hide resolved

v0idpwn added 4 commits July 21, 2025 14:05

chore: renaming and fixes

534feda

refactoring

c962967

better naming

f325f27

test: add local_proxy_shards config in test env

0f99498

abc3 reviewed Jul 22, 2025

View reviewed changes

fix: add port info on log

c84e444

v0idpwn enabled auto-merge (squash) July 22, 2025 13:01

v0idpwn merged commit e88fb4b into main Jul 22, 2025
12 checks passed

v0idpwn deleted the feat/ranch-instances branch July 22, 2025 13:11

v0idpwn mentioned this pull request Jul 28, 2025

chore: bump to v2.6.0 #712

Merged

	Logger.notice("Proxy started #{opts.mode}(local=#{opts.local}) on port #{port}")
	port = :ranch.get_port(key)
	Logger.notice("Proxy started #{opts.mode}(local=#{opts.local}) on port #{port}")

Uh oh!

Comments

Conversation

v0idpwn commented Jul 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

abc3 commented Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

v0idpwn commented Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

v0idpwn commented Jul 21, 2025

Uh oh!

Uh oh!

Uh oh!

abc3 Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

v0idpwn Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

abc3 Jul 22, 2025

Choose a reason for hiding this comment

Uh oh!

abc3 Jul 22, 2025

Choose a reason for hiding this comment

Uh oh!

v0idpwn Jul 22, 2025

Choose a reason for hiding this comment

Uh oh!

abc3 Jul 22, 2025

Choose a reason for hiding this comment

Uh oh!

v0idpwn Jul 22, 2025

Choose a reason for hiding this comment

Uh oh!

v0idpwn Jul 22, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

v0idpwn commented Jul 20, 2025 •

edited

Loading

abc3 commented Jul 21, 2025 •

edited

Loading

v0idpwn commented Jul 21, 2025 •

edited

Loading

abc3 Jul 22, 2025 •

edited

Loading

v0idpwn Jul 22, 2025 •

edited

Loading