Skip to content

[management,proxy,client] Add L4 capabilities (TLS/TCP/UDP)#5530

Open
lixmal wants to merge 29 commits intomainfrom
reverse-proxy-l4
Open

[management,proxy,client] Add L4 capabilities (TLS/TCP/UDP)#5530
lixmal wants to merge 29 commits intomainfrom
reverse-proxy-l4

Conversation

@lixmal
Copy link
Collaborator

@lixmal lixmal commented Mar 6, 2026

Describe your changes

Add TCP, UDP, and TLS passthrough modes to the reverse proxy alongside HTTP.

Proxy

  • TCP router with SNI-aware demultiplexing (TLS ClientHello peeking vs plain TCP)
  • Bidirectional TCP relay with PROXY protocol v2 support and per-backend connection limiting
  • Session-based UDP relay with configurable idle timeout and per-account metrics
  • Raw TCP listener on main port with SNI router (replaces direct HTTPS bind)
  • Per-port router management for TCP/UDP/TLS services (create, cleanup, drain)
  • Graceful shutdown with drain phase for active relay connections
  • L4 metrics: service gauge by mode, active connections/sessions, bytes by direction

Management

  • Mode field: http (default), tcp, udp, tls with typed constants
  • ListenPort (uint16) for L4 port binding with auto-assignment (configurable range via NB_PROXY_PORT_MIN/NB_PROXY_PORT_MAX)
  • Port collision detection per cluster+protocol+port (TLS same-domain exception for SNI sharing)
  • Custom port gated on cluster proxy capability reporting
  • Mode-specific validation: domain requirements, target count, auth flag compatibility
  • Domain required for all service modes; unique index on domain preserved
  • GetServicesByClusterAndPort for conflict detection, GetHTTPServiceByDomain excludes L4

Client

  • Expose command: tcp/udp/tls protocol support, --external-port flag
  • Auth flags rejected for L4 modes

Shared/API

  • OpenAPI: mode, listen_port, port_auto_assigned, proxy_protocol, session_idle_timeout
  • Proto: ProxyCapabilities message, mode+listen_port on ProxyMapping

Tests

  • ~4,600 lines of new tests: TCP router, UDP relay, SNI peeking, PROXY protocol, port allocation, L4 metrics, hijacked conn tracking

Stack

Checklist

  • Is it a bug fix
  • Is a typo/documentation fix
  • Is a feature enhancement
  • It is a refactor
  • Created tests that fail without the change (if possible)

By submitting this pull request, you confirm that you have read and agree to the terms of the Contributor License Agreement.

Documentation

Select exactly one:

  • I added/updated documentation for this change
  • Documentation is not needed for this change (explain why)

Summary by CodeRabbit

  • New Features

    • TCP, UDP and TLS (SNI passthrough) exposure and PROXY v2 support; UDP session relay with idle-timeout, rate limits and observability.
    • Per-service stable IDs and per-service lifecycle (renew/stop) for exposed services.
  • Enhancements

    • Mode/listen_port controls for services (http/tcp/udp/tls), automatic external-port assignment with conflict detection and cluster-aware capability checks; user note when requested port is overridden.
    • Expanded metrics, debug UI (Services/ServiceKeys) and API docs to surface mode, listen_port and port_auto_assigned.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 6, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds L4 (tcp/udp/tls) exposure and routing across client, management, and proxy: mode/listen_port, port auto-assignment and conflict checks, ServiceID and PortAutoAssigned in flows, proxy capabilities for custom ports, SNI/TCP routing, UDP relays, PROXY v2, metrics, and extensive tests.

Changes

Cohort / File(s) Summary
Client CLI & wiring
client/cmd/expose.go, client/internal/expose/request.go, client/internal/expose/manager.go, client/internal/expose/manager_test.go
Expose command extended with Mode/ListenPort, validations for L4 vs HTTP, Request.ListenPort added, Response gains ServiceID/PortAutoAssigned, Renew/Stop accept serviceID.
Client proto / daemon
client/proto/daemon.proto, client/server/server.go
Proto adds EXPOSE_TLS, listen_port and port_auto_assigned; server populates PortAutoAssigned and passes ServiceID to keep-alive manager.
Management gRPC & bindings
management/internals/shared/grpc/*.go, shared/management/proto/*.proto, shared/management/client/*.go
Expose RPCs now carry listen_port/service_id; Renew/Stop accept service_id; ProxyCapabilities and PathTargetOptions extended (proxy_protocol, session_idle_timeout); client bindings and mocks updated.
Service model & API schemas
management/internals/modules/reverseproxy/service/*.go, shared/management/http/api/*, shared/management/http/api/types.gen.go
Service/Target types gain Mode, ListenPort, PortAutoAssigned, ProxyProtocol, SessionIdleTimeout; Target.Port → uint16; new ServiceMode enums; OpenAPI/types updated.
Service manager, store & tests
management/internals/modules/reverseproxy/service/manager/*.go, management/server/store/*.go, management/server/store/store_mock.go, tests...
Port auto-assignment, conflict checks, assign/persist ListenPort; Renew/Stop refactored to use serviceID; store APIs added (GetHTTPServiceByDomain, GetServicesByCluster/AndPort); many manager flows and tests adapted for mode/port semantics.
Domain & proxy capabilities
management/internals/modules/reverseproxy/domain/*, .../proxy/manager.go, .../proxy/controller.go, .../proxy/manager_mock.go
Domain.SupportsCustomPorts added; clusterCapabilities provider and Controller.ClusterSupportsCustomPorts introduced; domain manager wired to populate supports_custom_ports.
Proxy gRPC snapshot & mapping
management/internals/shared/grpc/proxy.go, management/internals/shared/grpc/proxy_test.go
Per-connection capabilities captured; snapshot building reworked (snapshotServiceMappings); ClusterSupportsCustomPorts used to filter/route mappings; streaming semantics adjusted.
Proxy server & runtime wiring
proxy/server.go, proxy/cmd/proxy/cmd/root.go, proxy/internal/roundtrip/netbird.go
Per-port routers, mainRouter/mainPort, UDP relays, service-key model, SupportsCustomPorts and DefaultDialTimeout flags; mapping lifecycle, per-port setup/cleanup, and NetBird APIs moved to service-centric model.
TCP SNI router & relay
proxy/internal/tcp/*.go (router, relay, snipeek, peekedconn, chanlistener, proxyprotocol) and tests/bench
New SNI-based router, per-service routes, TCP relay with idle-timeout and half-close, PROXY v2 writer, ClientHello SNI peek and peekedConn, chanListener for TLS ServeTLS; extensive tests & benches.
UDP relay
proxy/internal/udp/relay.go, proxy/internal/udp/relay_test.go
New UDP relay with per-client sessions, idle cleanup, session limits, observer callbacks; comprehensive tests.
Metrics & netutil
proxy/internal/metrics/metrics.go, proxy/internal/metrics/l4_metrics_test.go, proxy/internal/netutil/errors.go, tests
Added L4/TCP/UDP Prometheus metrics and recording methods; network error helpers and ValidatePort with tests.
Conntrack & hijack tracking
proxy/internal/conntrack/hijacked.go, proxy/internal/conntrack/conn.go, tests
HijackTracker refactored to mutex+map, per-host tracking, CloseByHost and lifecycle helpers; host-aware trackedConn.
Proxy mapping types & internals
proxy/internal/proxy/servicemapping.go, proxy/internal/proxy/reverseproxy.go
Mapping type added; RemoveMapping returns existence bool; reverseproxy uses context dial-timeout helper.
NetBird & debug UI
proxy/internal/roundtrip/*, proxy/internal/debug/*, templates
Domain-centric naming moved to service-centric (ServiceKey/ServiceCount/ServiceKeys); debug UI/templates updated to show Services.
Account/policy & types
management/server/types/account.go, proxy/internal/types/*
Port types moved to uint16; resolveTargetPort returns uint16; new ServiceMode and RelayDirection types and dial-timeout context helpers added.
Tests, mocks & infra
numerous *_test.go, go.mod, imports
Extensive new/updated unit and integration tests for L4 conflicts, SNI routing, relays, PROXY v2, UDP, metrics; prometheus dependency promoted; many mocks updated.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client/Peer
    participant Daemon as Local Daemon
    participant GRPC as Management gRPC
    participant Manager as ReverseProxy Manager
    participant Store as Store
    participant Proxy as Proxy Server
    participant Backend as Backend Service

    Note over Client,Manager: L4/TLS expose flow
    Client->>Daemon: CreateExpose(mode, listen_port?)
    Daemon->>GRPC: CreateExpose RPC
    GRPC->>Manager: CreateServiceFromPeer(mode, listen_port)
    Manager->>Manager: ensureL4Port / assignPort
    Manager->>Store: Create service (Mode, ListenPort)
    Store-->>Manager: ServiceID, PortAutoAssigned
    Manager-->>GRPC: ExposeServiceResponse(ServiceID, PortAutoAssigned, ServiceURL)
    GRPC-->>Daemon: Response

    Note over Client,Proxy: Runtime TCP/TLS flow
    Client->>Proxy: TCP/TLS connect to proxy:ListenPort
    Proxy->>Proxy: PeekClientHello(conn) -> extract SNI
    Proxy->>Proxy: lookupRoute(SNI, ListenPort) -> Route(ServiceID)
    Proxy->>Proxy: relayTCP(conn -> backend) (writeProxyProtoV2 if enabled)
    Proxy->>Backend: Dial backend and relay data
    Backend-->>Proxy: response
    Proxy-->>Client: relayed response
Loading

Estimated Code Review Effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly Related PRs

Suggested Reviewers

  • pascal-fischer

"🐰
I hopped through code to find a port,
SNI and UDP now play their part,
TCP relays hum, PROXYs sing,
Services route on every string,
Hooray — the proxy garden springs to heart!"

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch reverse-proxy-l4

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 18

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (9)
proxy/internal/debug/client.go (1)

169-191: ⚠️ Potential issue | 🟡 Minor

Rename the CLI column header to SERVICES.

These lines now print service keys, but Line 155 still labels the third column as DOMAINS, so the CLI output becomes misleading for operators.

Suggested fix
-	_, _ = fmt.Fprintf(c.out, "%-38s %-12s %-40s %s\n", "ACCOUNT ID", "AGE", "DOMAINS", "HAS CLIENT")
+	_, _ = fmt.Fprintf(c.out, "%-38s %-12s %-40s %s\n", "ACCOUNT ID", "AGE", "SERVICES", "HAS CLIENT")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@proxy/internal/debug/client.go` around lines 169 - 191, The CLI header still
labels the third column as "DOMAINS" while the rows (via
Client.extractServiceKeys and the fmt.Fprintf call that prints
client["account_id"], client["age"], services, hasClient) output service keys;
update the header string printed for that table to "SERVICES" so the column
label matches the data being printed.
management/internals/modules/reverseproxy/service/service.go (1)

586-607: ⚠️ Potential issue | 🟠 Major

Make target validation mode-aware.

HTTP currently accepts session_idle_timeout and echoes it back through the API, but targetOptionsToProto() never serializes that field for HTTP mappings. L4 has the inverse bug: validateL4Target() skips validateTargetOptions(), so negative/oversized durations and HTTP-only knobs like skip_tls_verify, path_rewrite, and custom_headers are accepted and then ignored.

Also applies to: 612-632

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/internals/modules/reverseproxy/service/service.go` around lines
586 - 607, The current validation/serialization is mode-unaware: update
validateHTTPTargets and validateL4Target to validate only the options relevant
to their mode and ensure session_idle_timeout is serialized for HTTP in
targetOptionsToProto; specifically, modify validateTargetOptions (or add a
mode-aware wrapper) to accept a service mode enum (HTTP vs L4) and reject
invalid fields (e.g., HTTP should reject L4-only knobs, L4 should reject
HTTP-only knobs like skip_tls_verify, path_rewrite, custom_headers and enforce
duration bounds), call this mode-aware validator from validateHTTPTargets and
validateL4Target, and change targetOptionsToProto to include
session_idle_timeout when serializing HTTP target Options.
proxy/cmd/proxy/cmd/root.go (1)

207-217: ⚠️ Potential issue | 🟡 Minor

Remove unused envIntOrDefault function.

This function is not referenced anywhere in the codebase and should be removed. The envUint16OrDefault function has replaced its usage.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@proxy/cmd/proxy/cmd/root.go` around lines 207 - 217, Remove the unused
function envIntOrDefault: delete the entire envIntOrDefault function block (the
os.LookupEnv/strconv.Atoi logic) since envUint16OrDefault supersedes it, and
then run a quick import clean-up to remove any now-unused imports (e.g., strconv
or others) in the same file to avoid compile warnings.
management/internals/shared/grpc/proxy.go (1)

302-326: ⚠️ Potential issue | 🟠 Major

Fail the snapshot instead of silently dropping services.

Line 320 continues after GenerateToken fails, so sendSnapshot can still emit InitialSyncComplete with a missing mapping set. That leaves the proxy in a permanently incomplete startup state until another update or reconnect happens.

Suggested fix
 func (s *ProxyServiceServer) snapshotServiceMappings(ctx context.Context, conn *proxyConnection) ([]*proto.ProxyMapping, error) {
 	services, err := s.serviceManager.GetGlobalServices(ctx)
 	if err != nil {
 		return nil, fmt.Errorf("get services from store: %w", err)
 	}
 
 	var mappings []*proto.ProxyMapping
 	for _, service := range services {
 		if !service.Enabled || service.ProxyCluster == "" || service.ProxyCluster != conn.address {
 			continue
 		}
 
 		token, err := s.tokenStore.GenerateToken(service.AccountID, service.ID, 5*time.Minute)
 		if err != nil {
-			log.WithFields(log.Fields{
-				"service": service.Name,
-				"account": service.AccountID,
-			}).WithError(err).Error("failed to generate auth token for snapshot")
-			continue
+			return nil, fmt.Errorf("generate auth token for snapshot service %s: %w", service.ID, err)
 		}
 
 		m := service.ToProtoMapping(rpservice.Create, token, s.GetOIDCValidationConfig())
 		mappings = append(mappings, m)
 	}
 	return mappings, nil
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/internals/shared/grpc/proxy.go` around lines 302 - 326, The
snapshotServiceMappings function currently swallows GenerateToken errors and
continues, causing sendSnapshot to proceed with incomplete mappings; change the
behavior so that when s.tokenStore.GenerateToken(service.AccountID, service.ID,
5*time.Minute) returns an error you log the error with service context and
return a wrapped error (e.g., fmt.Errorf("generate token for service %s (account
%s): %w", service.ID, service.AccountID, err)) so snapshotServiceMappings fails
fast instead of silently dropping services; update callers (if any) to handle
the returned error accordingly.
proxy/internal/metrics/metrics.go (2)

172-182: ⚠️ Potential issue | 🟡 Minor

Populate the backend host label when req.Host is empty.

For normal outbound requests, req.Host is often blank and the authority lives in req.URL.Host, so this metric will collapse distinct backends into host="".

Suggested fix
 		labels := prometheus.Labels{
 			"method": req.Method,
-			"host":   req.Host,
+			"host":   req.Host,
 			// Fill potentially empty labels with default values to avoid cardinality issues.
 			"path":   "/",
 			"status": "0",
 			"size":   "0",
 		}
 		if req.URL != nil {
+			if labels["host"] == "" {
+				labels["host"] = req.URL.Host
+			}
 			labels["path"] = req.URL.Path
 		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@proxy/internal/metrics/metrics.go` around lines 172 - 182, The host label is
left empty when req.Host is blank causing backend aggregation; update the label
initialization logic in the metrics code (where labels := prometheus.Labels{...}
is created) to set labels["host"] = req.Host if non-empty, otherwise fallback to
req.URL.Host (check req.URL != nil first), ensuring the backend authority is
taken from req.URL.Host when req.Host is empty; adjust only the host assignment
so other defaults remain unchanged.

141-159: ⚠️ Potential issue | 🟠 Major

Decrement activeRequests in a defer.

If next panics, Line 152 is skipped and the in-flight gauge stays incremented for the lifetime of the process.

Suggested fix
 func (m *Metrics) Middleware(next http.Handler) http.Handler {
 	return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
 		m.requestsTotal.Inc()
 		m.activeRequests.Inc()
+		defer m.activeRequests.Dec()
 
 		interceptor := &responseInterceptor{PassthroughWriter: responsewriter.New(w)}
 
 		start := time.Now()
 		next.ServeHTTP(interceptor, r)
 		duration := time.Since(start)
 
-		m.activeRequests.Dec()
 		m.requestDuration.With(prometheus.Labels{
 			"status": strconv.Itoa(interceptor.status),
 			"size":   strconv.Itoa(interceptor.size),
 			"method": r.Method,
 			"host":   r.Host,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@proxy/internal/metrics/metrics.go` around lines 141 - 159, The in-flight
gauge m.activeRequests is incremented but only decremented after next.ServeHTTP,
so a panic in the handler will leak; fix Metrics.Middleware by inserting a defer
m.activeRequests.Dec() immediately after m.activeRequests.Inc() (i.e., in the
Middleware function, right after the increment) so the decrement always runs,
and keep the existing requestDuration observation logic unchanged; reference:
Metrics.Middleware, m.activeRequests, next.ServeHTTP, and responseInterceptor.
shared/management/http/api/openapi.yml (1)

3083-3097: ⚠️ Potential issue | 🟠 Major

ServiceTarget is still HTTP-centric for L4 services.

Line 3084 expands target_type for L4 backends, but protocol remains required and limited to http|https in the same schema. That forces TCP/UDP/TLS targets to carry an HTTP-only field, and path is still available outside HTTP.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@shared/management/http/api/openapi.yml` around lines 3083 - 3097, The
ServiceTarget schema is HTTP-centric: the "protocol" field (enum http|https) and
"path" are currently required/available for all target_type values; update the
schema (ServiceTarget) so L4 targets (target_type values like host, subnet,
resource, peer when used for TCP/UDP/TLS) do not require or expose HTTP-only
fields. Implement this by replacing the flat fields with a oneOf/anyOf split
(e.g., ServiceTargetHttp vs ServiceTargetL4) or by making "protocol" and "path"
conditional using discriminators on "target_type": keep "protocol" and "path"
only in the HTTP-specific variant (protocol enum [http, https] and path string),
and make the L4 variant omit "path" and allow protocol to be optional or to
accept L4-relevant protocols (or none) accordingly.
proxy/server.go (2)

204-204: ⚠️ Potential issue | 🔴 Critical

Start the mapping worker after the proxy state is initialized.

The initial sync can arrive before s.netbird, s.proxy, or s.mainRouter are ready, and processMappings() will then race startup and nil-deref in the mapping setup path.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@proxy/server.go` at line 204, Start the management mapping worker only after
the proxy state is fully initialized to avoid nil derefs in processMappings();
relocate or guard the goroutine spawn of s.newManagementMappingWorker(ctx,
s.mgmtClient) so it runs after s.netbird, s.proxy and s.mainRouter are set (or
wait on an explicit ready channel/cond or call a helper like waitForProxyReady)
and/or add a nil-check in processMappings() invoked by
newManagementMappingWorker to ensure it returns or blocks until those fields are
non-nil; update references to newManagementMappingWorker, processMappings, and
the startup sequence to enforce this ordering.

282-294: ⚠️ Potential issue | 🟠 Major

Unexpected serve failures only shut down half of the data plane.

These branches call shutdownServices(), but that does not stop whichever of s.https or the main raw listener/router is still alive, and it leaves the mapping-worker context running. If one side fails first, the other side can keep serving against partially cleared state.

Also applies to: 587-650

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@proxy/server.go` around lines 282 - 294, When either httpsErr or routerErr
fires you only call s.shutdownServices(), leaving the other listener (s.https or
the main router/listener) and the mapping-worker context running; fix by fully
stopping both sides and canceling the mapping worker: after s.shutdownServices()
explicitly shut down s.https (e.g. s.https.Shutdown(ctx) if non-nil), close the
router/main listener (e.g. s.router.Close() or s.listener.Close()), call the
mapping-worker cancel function (mappingCtxCancel or equivalent), and then wait
for the service goroutines to exit (s.wg.Wait() or existing shutdownComplete
channel) before returning; update the error branches that handle httpsErr and
routerErr to perform these extra steps so a single-side failure tears down the
entire data plane.
🧹 Nitpick comments (8)
management/internals/modules/reverseproxy/service/manager/l4_port_test.go (2)

384-405: Derive the seeded collision port from the auto-assign range.

existingPort := 20000 only exercises the “skip occupied auto-assign port” path while that literal stays inside [autoAssignPortMin, autoAssignPortMax]. If the range shifts, this becomes a smoke test for auto-assignment and stops proving collision avoidance.

Suggested tweak
-	existingPort := uint16(20000)
+	existingPort := autoAssignPortMin
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/internals/modules/reverseproxy/service/manager/l4_port_test.go`
around lines 384 - 405, The test currently hardcodes existingPort :=
uint16(20000) which can drift out of the auto-assign range; instead compute the
seeded collision port from the auto-assign bounds (e.g., pick a value between
autoAssignPortMin and autoAssignPortMax such as autoAssignPortMin+5 or midpoint)
before calling seedService, so the call to seedService(t, testStore, "existing",
"tcp", testCluster, testCluster, existingPort) truly exercises collision
handling when invoking mgr.persistNewService with svc and asserting
svc.ListenPort != existingPort and svc.PortAutoAssigned.

101-115: Use the exported mode constants in these fixtures.

This helper and the inline cases below reintroduce "tcp", "udp", and "tls" literals in a PR that is otherwise moving mode handling to typed constants. Binding the tests to ModeTCP / ModeUDP / ModeTLS gives you compile-time coverage against enum/value drift instead of string-typo coverage.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/internals/modules/reverseproxy/service/manager/l4_port_test.go`
around lines 101 - 115, The tests and seedService helper are using raw mode
strings ("tcp","udp","tls"); replace those with the exported enum constants
(e.g. rpservice.ModeTCP, rpservice.ModeUDP, rpservice.ModeTLS). Update
seedService to keep the protocol parameter but ensure you pass the
rpservice.Mode* constants from every caller and assign svc.Mode = protocol and
each Target.Protocol = protocol so fixtures use the typed constants throughout
(refer to seedService, rpservice.Service.Mode, and rpservice.Target.Protocol).
shared/management/proto/proxy_service.proto (1)

91-108: Field number 10 is skipped in ProxyMapping.

The mode field uses number 11 and listen_port uses 12, skipping 10. This is valid protobuf and won't cause issues, but if field 10 was intentionally reserved for future use, consider adding a reserved 10; declaration to document this and prevent accidental reuse.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@shared/management/proto/proxy_service.proto` around lines 91 - 108, The Proto
message ProxyMapping has field numbers 1–9, then skips 10 and uses 11 for mode
and 12 for listen_port; to document and prevent accidental reuse of 10, add a
reserved declaration for field number 10 in the ProxyMapping message (e.g.,
reserved 10;) so it's clear the slot is intentionally kept and the compiler will
forbid reuse—modify the ProxyMapping definition to include this reserved 10
entry alongside existing fields like mode and listen_port.
proxy/internal/udp/relay.go (1)

97-138: Consider consolidating parameters into an options struct.

The New function has 9 parameters, which exceeds typical guidelines for function arity. Consider grouping related parameters into a configuration struct to improve readability and make future extensions easier.

♻️ Example options struct approach
type RelayConfig struct {
    Target      string
    AccountID   types.AccountID
    DialFunc    types.DialContextFunc
    DialTimeout time.Duration
    SessionTTL  time.Duration
    MaxSessions int
}

func New(
    parentCtx context.Context,
    logger *log.Entry,
    listener net.PacketConn,
    cfg RelayConfig,
) *Relay {
    // ...
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@proxy/internal/udp/relay.go` around lines 97 - 138, The New constructor
currently takes many parameters; create a RelayConfig (or Options) struct
containing related fields (Target, AccountID, DialFunc, DialTimeout, SessionTTL,
MaxSessions) and change New to New(parentCtx context.Context, logger *log.Entry,
listener net.PacketConn, cfg RelayConfig) *Relay; move the defaulting logic for
dialTimeout, sessionTTL, and maxSessions into New using cfg values, update use
of cfg.* inside the returned Relay (target, accountID, dialFunc, etc.), and
update all callers to build a RelayConfig instead of passing the long parameter
list so adding future fields is straightforward.
proxy/internal/tcp/relay.go (1)

93-127: Consider clearing the read deadline after the loop exits.

When copyWithIdleTimeout exits (either on error or EOF), the connection retains the last deadline set. If the connection is reused or the other direction continues, this stale deadline could cause unexpected timeouts.

However, since Relay closes both connections after the WaitGroup completes, this is likely not an issue in practice.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@proxy/internal/tcp/relay.go` around lines 93 - 127, The connection's read
deadline is left set on all return paths in copyWithIdleTimeout; after asserting
src is a net.Conn (the conn variable in copyWithIdleTimeout) add a deferred call
to clear the deadline (call conn.SetReadDeadline(time.Time{}) ) so the deadline
is removed when the function exits. Place the defer immediately after the conn,
ok check (after determining conn is a net.Conn) and you can ignore the error
returned from clearing the deadline so it doesn't override the primary return
error.
management/server/store/store.go (1)

269-270: Consider using ServiceMode type for the mode parameter.

The GetServicesByClusterAndPort method takes mode as a plain string. Using the typed ServiceMode from proxy/internal/types would provide compile-time safety and self-documenting code.

-GetServicesByClusterAndPort(ctx context.Context, lockStrength LockingStrength, proxyCluster string, mode string, listenPort uint16) ([]*rpservice.Service, error)
+GetServicesByClusterAndPort(ctx context.Context, lockStrength LockingStrength, proxyCluster string, mode types.ServiceMode, listenPort uint16) ([]*rpservice.Service, error)

This would require importing the types package and updating implementations accordingly.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/server/store/store.go` around lines 269 - 270, Update
GetServicesByClusterAndPort to accept the typed ServiceMode instead of a raw
string: change the signature GetServicesByClusterAndPort(ctx context.Context,
lockStrength LockingStrength, proxyCluster string, mode string, listenPort
uint16) to use mode types.ServiceMode, add the import for proxy/internal/types,
and update all call sites and implementations (e.g., the store method body and
any callers) to pass/handle a types.ServiceMode value rather than a string,
converting from string only at boundaries if necessary.
management/internals/shared/grpc/proxy_test.go (1)

332-383: Test name may not match actual filtering behavior.

TestSendServiceUpdateToCluster_FiltersOnCapability suggests proxies are filtered by capability, but the test shows both proxy-a (supports custom ports) and proxy-b (doesn't support) receive both TLS and HTTP mappings. The test verifies that TLS passthrough works on all proxies, which is correct behavior, but the test name implies capability-based filtering that isn't actually demonstrated.

Consider renaming to better reflect what's being tested (e.g., TestSendServiceUpdateToCluster_TLSWorksOnAllProxies), or add assertions that verify actual filtering scenarios if capability-based filtering should occur for other cases.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/internals/shared/grpc/proxy_test.go` around lines 332 - 383, The
test named TestSendServiceUpdateToCluster_FiltersOnCapability is misleading
because it asserts TLS and HTTP mappings are delivered to both proxies (no
capability-based filtering); either rename the test to reflect behavior (e.g.,
TestSendServiceUpdateToCluster_TLSWorksOnAllProxies) or modify it to actually
exercise filtering by capability: add a mapping type that should be filtered and
assert that registerFakeProxyWithCaps-created proxies with differing
proto.ProxyCapabilities receive or do not receive that mapping when
SendServiceUpdateToCluster is called; update the test name and/or add assertions
accordingly so the name matches the asserted behavior.
client/cmd/expose.go (1)

76-85: Edge case: extractPort may return empty string for URLs without port.

If serviceURL is something like tcp://host (no port), LastIndex(":") finds the protocol separator, and serviceURL[i+1:] would return "host", not a port number. This could produce confusing output.

Consider a more robust port extraction:

♻️ Proposed improvement
 func extractPort(serviceURL string, fallback uint16) string {
-	if i := strings.LastIndex(serviceURL, ":"); i != -1 {
-		if p := serviceURL[i+1:]; p != "" {
-			return p
+	// Handle URLs like "tcp://host:12345" or "host:12345"
+	// Strip protocol prefix first
+	url := serviceURL
+	if idx := strings.Index(url, "://"); idx != -1 {
+		url = url[idx+3:]
+	}
+	// Now find the last colon (port separator)
+	if i := strings.LastIndex(url, ":"); i != -1 {
+		if p := url[i+1:]; p != "" {
+			// Verify it's numeric
+			if _, err := strconv.Atoi(p); err == nil {
+				return p
+			}
 		}
 	}
 	return strconv.FormatUint(uint64(fallback), 10)
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@client/cmd/expose.go` around lines 76 - 85, extractPort currently finds the
last ":" which mis-parses URLs like "tcp://host" (returns "//host" or wrong
substring); update extractPort to properly parse the serviceURL using
url.Parse(serviceURL), then inspect the parsed URL's Host (u.Host) and obtain
the port via u.Port() or net.SplitHostPort(u.Host) (falling back to
strconv.FormatUint(uint64(fallback), 10) when no port is present or parsing
fails). Ensure you import net/url and net, and keep the function signature
extractPort(serviceURL string, fallback uint16) string.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@client/internal/expose/manager_test.go`:
- Around line 49-56: The renew test's mock RenewExposeFunc currently asserts
only the domain but not the forwarded serviceID; update the mock in the tests
that call NewManager(...).renew(...) to also assert the serviceID parameter
(e.g., assert.Equal(t, "service-123", serviceID, "serviceID should be passed
through")) so the Manager.renew path and RenewExposeFunc receive and forward the
correct service identifier; repeat the same additional assertion in the other
test variant that invokes RenewExposeFunc.

In `@management/internals/modules/reverseproxy/service/manager/manager.go`:
- Around line 43-59: The auto-assign port range is not validated and using
uint16 arithmetic causes underflow/overflow when autoAssignPortMin >
autoAssignPortMax or autoAssignPortMax == 65535; in init() (and any logic using
autoAssignPortMin/autoAssignPortMax) validate and normalize the values produced
by portFromEnv: clamp autoAssignPortMax to 65535, ensure autoAssignPortMin <=
autoAssignPortMax (swap or set to fallback/default and log a warning), and
perform range arithmetic using a larger signed integer type (e.g., int) or
explicit bounds checks to avoid wraparound in loops that scan ports (refer to
autoAssignPortMin, autoAssignPortMax, portFromEnv and wherever the port scan
loop runs).

In `@management/internals/modules/reverseproxy/service/service.go`:
- Around line 559-580: Service.Validate currently permits password/PIN/OIDC auth
on L4 services; add checks in the ModeTCP/ModeUDP and ModeTLS branches of
Service.Validate() to reject any enabled auth configuration (e.g., password
enabled, PIN enabled, or OIDC enabled) and return an error if any are set.
Locate the switch handling ModeTCP/ModeUDP and ModeTLS in Service.Validate(),
and before returning (and before calling validateL4Target), verify the service's
auth fields (the same auth flags used by ToProtoMapping/peer expose) are all
disabled and return a clear error like "auth is not supported for TCP/UDP/TLS
services" when any are enabled. Ensure the validation mirrors the peer expose
rejection logic so API-created L4 services cannot advertise HTTP auth in
ToProtoMapping.
- Around line 893-902: The ExposeServiceRequest currently has no field for the
upstream scheme, causing HTTPS backends to be downgraded because the code
unconditionally sets Protocol: "http" for non-L4 targets; add a dedicated scheme
field (e.g., Scheme or BackendScheme) to ExposeServiceRequest and thread it
through the conversion logic that builds target entries (the code path that
currently assigns Protocol: "http") so that when the request indicates
EXPOSE_HTTPS you set Protocol to "https" (and "http" for EXPOSE_HTTP),
preserving the original client/proto/daemon.proto distinction; update any
callers that create ExposeServiceRequest to populate the new field and map the
daemon.proto enum to the string used for Protocol.
- Around line 550-558: The validation currently treats empty s.Mode as HTTP only
for checks but doesn't normalize the field; update the code path that validates
service mode (the switch handling s.Mode in service.validate or Validate) to set
s.Mode = ModeHTTP when s.Mode == "" before running further checks (so subsequent
serialization emits "http"); then continue to call s.validateHTTPTargets() and
the existing listen_port checks unchanged. Reference: s.Mode, ModeHTTP, and
validateHTTPTargets().

In `@management/internals/shared/grpc/expose_service.go`:
- Around line 44-50: The Port and ListenPort fields are cast directly from
exposeReq.Port and exposeReq.ListenPort (int32) to uint16, which can silently
wrap invalid values; before casting in the exposed service construction (where
Port: uint16(exposeReq.Port) and ListenPort: uint16(exposeReq.ListenPort) are
set), validate both values using the existing ValidatePort helper from
proxy/internal/netutil (or equivalent), returning an error or rejecting the
request if validation fails, then safely cast to uint16 only after validation;
update any error handling to surface validation failures back to the caller.

In `@management/internals/shared/grpc/proxy.go`:
- Around line 536-549: ClusterSupportsCustomPorts currently returns true if any
proxy in the cluster reports SupportsCustomPorts, which allows a mixed-version
cluster to accept custom-port services; change the logic in
ClusterSupportsCustomPorts to require every proxy for the cluster to report
SupportsCustomPorts before returning true by iterating
s.proxyController.GetProxiesForCluster(clusterAddr) and returning false if any
connected proxy (from s.connectedProxies -> *proxyConnection) either has nil
capabilities or .SupportsCustomPorts == false, and only return true if all
proxies satisfy the capability; update callers such as
SendServiceUpdateToCluster behavior remains the same but will now only allow
custom ports when the entire cluster supports them.

In `@management/server/store/sql_store.go`:
- Around line 5161-5188: The GetServicesByClusterAndPort and
GetServicesByCluster methods currently return services without preloading
Targets or decrypting sensitive fields; update both functions
(SqlStore.GetServicesByClusterAndPort and SqlStore.GetServicesByCluster) to use
tx.Preload("Targets") (matching the pattern used in
GetServiceByID/GetServiceByDomain/GetServices/GetAccountServices) before Find
and then call DecryptSensitiveData(ctx, service) for each returned
*rpservice.Service (handle and propagate errors similarly to the other getters)
so callers receive fully populated, decrypted Service objects.

In `@proxy/internal/conntrack/hijacked.go`:
- Around line 27-31: The trackingWriter stores a canonical host
(hostOnly(r.Host)) but CloseByHost compares the raw argument byte-for-byte;
update CloseByHost to normalize its input the same way before matching: call
hostOnly(inputHost) (and/or lower-case it) and then compare to the stored
trackingWriter.host (or use strings.EqualFold) so lookups strip ports and are
case-insensitive; ensure both insertion (trackingWriter.host) and lookup
(CloseByHost) use the same canonicalization.

In `@proxy/internal/proxy/servicemapping.go`:
- Around line 105-113: RemoveMapping currently only checks for host existence
and can delete a newly rebounded mapping; update ReverseProxy.RemoveMapping to
verify the stored mapping's identity before deleting by comparing the existing
mapping's ID (or other unique identifier) to the provided Mapping.ID, and only
call delete(p.mappings, m.Host) and return true when the IDs match; otherwise
return false (while keeping the existing p.mappingsMux lock/unlock behavior).

In `@proxy/internal/roundtrip/netbird.go`:
- Around line 376-386: While holding the clientsMux in the unregister path,
compute and store the remaining services count into a local variable (e.g.,
remaining := len(entry.services)) immediately after delete(entry.services, key),
then unlock clientsMux and use that local variable in the logger.WithFields call
instead of calling len(entry.services) after unlocking; this prevents a race
with concurrent AddPeer()/other callers mutating entry.services while the log is
evaluated.

In `@proxy/internal/tcp/proxyprotocol_test.go`:
- Around line 83-90: In the goroutine that calls ln.Accept() (which assigns to
serverConn and signals on the accepted channel), ensure the accepted channel is
closed on both success and any accept error to avoid hanging; modify the
goroutine that does "serverConn, err = ln.Accept()" so that if err != nil it
closes accepted before returning, and apply the same change to the IPv6 test's
equivalent goroutine to keep behavior consistent between tests.
- Around line 21-28: The goroutine that calls ln.Accept() can exit on error
without signaling the main test, causing <-accepted to hang; modify the
goroutine that sets serverConn to always notify the main test on both success
and failure by either closing the accepted channel in the error path or sending
an error via a buffered error channel; update the closure that calls ln.Accept()
(the anonymous goroutine) to ensure accepted is closed (or an error sent) before
returning so the main test unblocks and can handle the Accept error instead of
hanging.

In `@proxy/server.go`:
- Around line 919-921: The rollback path is unconditionally calling
L4ServiceRemoved(), causing gauges to be decremented even when
setupTCPMapping()/setupUDPMapping()/setupTLSMapping() never succeeded (and on
failed modifies may decrement twice); fix by tracking whether the L4 service was
actually added and only call L4ServiceRemoved() when that add succeeded — e.g.,
introduce a local boolean (e.g., "l4Added") in the function that calls
setupTCPMapping/setupUDPMapping/setupTLSMapping, set it true only after
L4ServiceAdded() returns successfully, and in all error/rollback paths
(including where s.setupMappingRoutes fails) call s.removeMapping or
L4ServiceRemoved() only if l4Added is true; apply the same guarded-decrement
logic to the modify paths to prevent double-decrements.
- Around line 930-941: modifyMapping currently cleans up the existing routes
before attempting to set up the new mapping, which can leave the service down if
setupMappingRoutes fails; change the flow so you first attempt to set up the new
mapping without removing the old one (call setupMappingRoutes with the new
mapping first), only after setup succeeds store the new mapping (s.storeMapping)
and then call cleanupMappingRoutes on the old mapping (using
s.loadMapping(mapping.GetId()) to identify it), and if setupMappingRoutes fails
ensure you do not cleanup the old mapping (i.e., rollback/no-op) so the previous
working routes remain active; also make sure any cleanup on failure only targets
partial new resources, not the existing mapping.
- Around line 720-724: The custom-port listener created with net.Listen (the
listenAddr/ln code) currently skips wrapProxyProtocol(), allowing PROXY headers
to reach SNI peeking or backends; fix by passing the freshly created listener
(ln) through the same wrapProxyProtocol(...) wrapper used for other listeners
(e.g., call wrapProxyProtocol(listenAddr, ln) or the equivalent wrapper
function) so the PROXY protocol is consistently applied to per-port TCP/TLS
listeners before they are returned or used; ensure the wrapped listener replaces
ln in the subsequent flow so all downstream code uses the proxied listener.

In `@shared/management/http/api/openapi.yml`:
- Around line 3018-3025: The current flat schema adds mode but still enforces
fields like domain unconditionally and leaves listen_port as prose-only; update
the OpenAPI schema so requests are split by mode (e.g., use oneOf with two
subschemas or discriminator on mode) such that the HTTP variant requires domain
and forbids listen_port, while the L4/TLS variant requires listen_port (and
optionally allows numeric 0) and forbids domain; adjust the schema entries for
mode, domain, and listen_port accordingly and add clear validation rules
(required/forbidden properties) so HTTP and L4 payloads cannot be mixed.
- Around line 4275-4277: The new shared response component named "conflict"
(409) must be referenced from the reverse-proxy service create/update operations
so clients see port/SNI collision errors; locate the reverse-proxy POST/PUT
operations in the same OpenAPI document and add responses: "409": { "$ref":
"#/components/responses/conflict" } to each operation's responses block (the
create and update operation objects), ensuring operationId(s) for the
reverse-proxy create/update are updated implicitly by adding this reference.

---

Outside diff comments:
In `@management/internals/modules/reverseproxy/service/service.go`:
- Around line 586-607: The current validation/serialization is mode-unaware:
update validateHTTPTargets and validateL4Target to validate only the options
relevant to their mode and ensure session_idle_timeout is serialized for HTTP in
targetOptionsToProto; specifically, modify validateTargetOptions (or add a
mode-aware wrapper) to accept a service mode enum (HTTP vs L4) and reject
invalid fields (e.g., HTTP should reject L4-only knobs, L4 should reject
HTTP-only knobs like skip_tls_verify, path_rewrite, custom_headers and enforce
duration bounds), call this mode-aware validator from validateHTTPTargets and
validateL4Target, and change targetOptionsToProto to include
session_idle_timeout when serializing HTTP target Options.

In `@management/internals/shared/grpc/proxy.go`:
- Around line 302-326: The snapshotServiceMappings function currently swallows
GenerateToken errors and continues, causing sendSnapshot to proceed with
incomplete mappings; change the behavior so that when
s.tokenStore.GenerateToken(service.AccountID, service.ID, 5*time.Minute) returns
an error you log the error with service context and return a wrapped error
(e.g., fmt.Errorf("generate token for service %s (account %s): %w", service.ID,
service.AccountID, err)) so snapshotServiceMappings fails fast instead of
silently dropping services; update callers (if any) to handle the returned error
accordingly.

In `@proxy/cmd/proxy/cmd/root.go`:
- Around line 207-217: Remove the unused function envIntOrDefault: delete the
entire envIntOrDefault function block (the os.LookupEnv/strconv.Atoi logic)
since envUint16OrDefault supersedes it, and then run a quick import clean-up to
remove any now-unused imports (e.g., strconv or others) in the same file to
avoid compile warnings.

In `@proxy/internal/debug/client.go`:
- Around line 169-191: The CLI header still labels the third column as "DOMAINS"
while the rows (via Client.extractServiceKeys and the fmt.Fprintf call that
prints client["account_id"], client["age"], services, hasClient) output service
keys; update the header string printed for that table to "SERVICES" so the
column label matches the data being printed.

In `@proxy/internal/metrics/metrics.go`:
- Around line 172-182: The host label is left empty when req.Host is blank
causing backend aggregation; update the label initialization logic in the
metrics code (where labels := prometheus.Labels{...} is created) to set
labels["host"] = req.Host if non-empty, otherwise fallback to req.URL.Host
(check req.URL != nil first), ensuring the backend authority is taken from
req.URL.Host when req.Host is empty; adjust only the host assignment so other
defaults remain unchanged.
- Around line 141-159: The in-flight gauge m.activeRequests is incremented but
only decremented after next.ServeHTTP, so a panic in the handler will leak; fix
Metrics.Middleware by inserting a defer m.activeRequests.Dec() immediately after
m.activeRequests.Inc() (i.e., in the Middleware function, right after the
increment) so the decrement always runs, and keep the existing requestDuration
observation logic unchanged; reference: Metrics.Middleware, m.activeRequests,
next.ServeHTTP, and responseInterceptor.

In `@proxy/server.go`:
- Line 204: Start the management mapping worker only after the proxy state is
fully initialized to avoid nil derefs in processMappings(); relocate or guard
the goroutine spawn of s.newManagementMappingWorker(ctx, s.mgmtClient) so it
runs after s.netbird, s.proxy and s.mainRouter are set (or wait on an explicit
ready channel/cond or call a helper like waitForProxyReady) and/or add a
nil-check in processMappings() invoked by newManagementMappingWorker to ensure
it returns or blocks until those fields are non-nil; update references to
newManagementMappingWorker, processMappings, and the startup sequence to enforce
this ordering.
- Around line 282-294: When either httpsErr or routerErr fires you only call
s.shutdownServices(), leaving the other listener (s.https or the main
router/listener) and the mapping-worker context running; fix by fully stopping
both sides and canceling the mapping worker: after s.shutdownServices()
explicitly shut down s.https (e.g. s.https.Shutdown(ctx) if non-nil), close the
router/main listener (e.g. s.router.Close() or s.listener.Close()), call the
mapping-worker cancel function (mappingCtxCancel or equivalent), and then wait
for the service goroutines to exit (s.wg.Wait() or existing shutdownComplete
channel) before returning; update the error branches that handle httpsErr and
routerErr to perform these extra steps so a single-side failure tears down the
entire data plane.

In `@shared/management/http/api/openapi.yml`:
- Around line 3083-3097: The ServiceTarget schema is HTTP-centric: the
"protocol" field (enum http|https) and "path" are currently required/available
for all target_type values; update the schema (ServiceTarget) so L4 targets
(target_type values like host, subnet, resource, peer when used for TCP/UDP/TLS)
do not require or expose HTTP-only fields. Implement this by replacing the flat
fields with a oneOf/anyOf split (e.g., ServiceTargetHttp vs ServiceTargetL4) or
by making "protocol" and "path" conditional using discriminators on
"target_type": keep "protocol" and "path" only in the HTTP-specific variant
(protocol enum [http, https] and path string), and make the L4 variant omit
"path" and allow protocol to be optional or to accept L4-relevant protocols (or
none) accordingly.

---

Nitpick comments:
In `@client/cmd/expose.go`:
- Around line 76-85: extractPort currently finds the last ":" which mis-parses
URLs like "tcp://host" (returns "//host" or wrong substring); update extractPort
to properly parse the serviceURL using url.Parse(serviceURL), then inspect the
parsed URL's Host (u.Host) and obtain the port via u.Port() or
net.SplitHostPort(u.Host) (falling back to strconv.FormatUint(uint64(fallback),
10) when no port is present or parsing fails). Ensure you import net/url and
net, and keep the function signature extractPort(serviceURL string, fallback
uint16) string.

In `@management/internals/modules/reverseproxy/service/manager/l4_port_test.go`:
- Around line 384-405: The test currently hardcodes existingPort :=
uint16(20000) which can drift out of the auto-assign range; instead compute the
seeded collision port from the auto-assign bounds (e.g., pick a value between
autoAssignPortMin and autoAssignPortMax such as autoAssignPortMin+5 or midpoint)
before calling seedService, so the call to seedService(t, testStore, "existing",
"tcp", testCluster, testCluster, existingPort) truly exercises collision
handling when invoking mgr.persistNewService with svc and asserting
svc.ListenPort != existingPort and svc.PortAutoAssigned.
- Around line 101-115: The tests and seedService helper are using raw mode
strings ("tcp","udp","tls"); replace those with the exported enum constants
(e.g. rpservice.ModeTCP, rpservice.ModeUDP, rpservice.ModeTLS). Update
seedService to keep the protocol parameter but ensure you pass the
rpservice.Mode* constants from every caller and assign svc.Mode = protocol and
each Target.Protocol = protocol so fixtures use the typed constants throughout
(refer to seedService, rpservice.Service.Mode, and rpservice.Target.Protocol).

In `@management/internals/shared/grpc/proxy_test.go`:
- Around line 332-383: The test named
TestSendServiceUpdateToCluster_FiltersOnCapability is misleading because it
asserts TLS and HTTP mappings are delivered to both proxies (no capability-based
filtering); either rename the test to reflect behavior (e.g.,
TestSendServiceUpdateToCluster_TLSWorksOnAllProxies) or modify it to actually
exercise filtering by capability: add a mapping type that should be filtered and
assert that registerFakeProxyWithCaps-created proxies with differing
proto.ProxyCapabilities receive or do not receive that mapping when
SendServiceUpdateToCluster is called; update the test name and/or add assertions
accordingly so the name matches the asserted behavior.

In `@management/server/store/store.go`:
- Around line 269-270: Update GetServicesByClusterAndPort to accept the typed
ServiceMode instead of a raw string: change the signature
GetServicesByClusterAndPort(ctx context.Context, lockStrength LockingStrength,
proxyCluster string, mode string, listenPort uint16) to use mode
types.ServiceMode, add the import for proxy/internal/types, and update all call
sites and implementations (e.g., the store method body and any callers) to
pass/handle a types.ServiceMode value rather than a string, converting from
string only at boundaries if necessary.

In `@proxy/internal/tcp/relay.go`:
- Around line 93-127: The connection's read deadline is left set on all return
paths in copyWithIdleTimeout; after asserting src is a net.Conn (the conn
variable in copyWithIdleTimeout) add a deferred call to clear the deadline (call
conn.SetReadDeadline(time.Time{}) ) so the deadline is removed when the function
exits. Place the defer immediately after the conn, ok check (after determining
conn is a net.Conn) and you can ignore the error returned from clearing the
deadline so it doesn't override the primary return error.

In `@proxy/internal/udp/relay.go`:
- Around line 97-138: The New constructor currently takes many parameters;
create a RelayConfig (or Options) struct containing related fields (Target,
AccountID, DialFunc, DialTimeout, SessionTTL, MaxSessions) and change New to
New(parentCtx context.Context, logger *log.Entry, listener net.PacketConn, cfg
RelayConfig) *Relay; move the defaulting logic for dialTimeout, sessionTTL, and
maxSessions into New using cfg values, update use of cfg.* inside the returned
Relay (target, accountID, dialFunc, etc.), and update all callers to build a
RelayConfig instead of passing the long parameter list so adding future fields
is straightforward.

In `@shared/management/proto/proxy_service.proto`:
- Around line 91-108: The Proto message ProxyMapping has field numbers 1–9, then
skips 10 and uses 11 for mode and 12 for listen_port; to document and prevent
accidental reuse of 10, add a reserved declaration for field number 10 in the
ProxyMapping message (e.g., reserved 10;) so it's clear the slot is
intentionally kept and the compiler will forbid reuse—modify the ProxyMapping
definition to include this reserved 10 entry alongside existing fields like mode
and listen_port.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5d0ee3bc-bd5b-4c0d-9313-c705fb067c06

📥 Commits

Reviewing files that changed from the base of the PR and between e6587b0 and 000ae70.

⛔ Files ignored due to path filters (3)
  • client/proto/daemon.pb.go is excluded by !**/*.pb.go
  • shared/management/proto/management.pb.go is excluded by !**/*.pb.go
  • shared/management/proto/proxy_service.pb.go is excluded by !**/*.pb.go
📒 Files selected for processing (77)
  • client/cmd/expose.go
  • client/internal/expose/manager.go
  • client/internal/expose/manager_test.go
  • client/internal/expose/request.go
  • client/proto/daemon.proto
  • client/server/server.go
  • go.mod
  • management/internals/modules/reverseproxy/domain/domain.go
  • management/internals/modules/reverseproxy/domain/manager/api.go
  • management/internals/modules/reverseproxy/domain/manager/manager.go
  • management/internals/modules/reverseproxy/proxy/manager.go
  • management/internals/modules/reverseproxy/proxy/manager/controller.go
  • management/internals/modules/reverseproxy/proxy/manager_mock.go
  • management/internals/modules/reverseproxy/service/interface.go
  • management/internals/modules/reverseproxy/service/interface_mock.go
  • management/internals/modules/reverseproxy/service/manager/api.go
  • management/internals/modules/reverseproxy/service/manager/expose_tracker_test.go
  • management/internals/modules/reverseproxy/service/manager/l4_port_test.go
  • management/internals/modules/reverseproxy/service/manager/manager.go
  • management/internals/modules/reverseproxy/service/manager/manager_test.go
  • management/internals/modules/reverseproxy/service/service.go
  • management/internals/modules/reverseproxy/service/service_test.go
  • management/internals/server/boot.go
  • management/internals/server/modules.go
  • management/internals/shared/grpc/expose_service.go
  • management/internals/shared/grpc/proxy.go
  • management/internals/shared/grpc/proxy_test.go
  • management/server/http/handler.go
  • management/server/http/testing/testing_tools/channel/channel.go
  • management/server/metrics/selfhosted.go
  • management/server/store/sql_store.go
  • management/server/store/store.go
  • management/server/store/store_mock.go
  • management/server/types/account.go
  • proxy/cmd/proxy/cmd/root.go
  • proxy/internal/conntrack/conn.go
  • proxy/internal/conntrack/hijacked.go
  • proxy/internal/conntrack/hijacked_test.go
  • proxy/internal/debug/client.go
  • proxy/internal/debug/handler.go
  • proxy/internal/debug/templates/clients.html
  • proxy/internal/debug/templates/index.html
  • proxy/internal/metrics/l4_metrics_test.go
  • proxy/internal/metrics/metrics.go
  • proxy/internal/netutil/errors.go
  • proxy/internal/netutil/errors_test.go
  • proxy/internal/proxy/reverseproxy.go
  • proxy/internal/proxy/servicemapping.go
  • proxy/internal/roundtrip/netbird.go
  • proxy/internal/roundtrip/netbird_bench_test.go
  • proxy/internal/roundtrip/netbird_test.go
  • proxy/internal/tcp/bench_test.go
  • proxy/internal/tcp/chanlistener.go
  • proxy/internal/tcp/peekedconn.go
  • proxy/internal/tcp/proxyprotocol.go
  • proxy/internal/tcp/proxyprotocol_test.go
  • proxy/internal/tcp/relay.go
  • proxy/internal/tcp/relay_test.go
  • proxy/internal/tcp/router.go
  • proxy/internal/tcp/router_test.go
  • proxy/internal/tcp/snipeek.go
  • proxy/internal/tcp/snipeek_test.go
  • proxy/internal/types/types.go
  • proxy/internal/types/types_test.go
  • proxy/internal/udp/relay.go
  • proxy/internal/udp/relay_test.go
  • proxy/management_integration_test.go
  • proxy/server.go
  • shared/management/client/client.go
  • shared/management/client/grpc.go
  • shared/management/client/mock.go
  • shared/management/client/rest/client.go
  • shared/management/client/rest/reverse_proxy_clusters.go
  • shared/management/http/api/openapi.yml
  • shared/management/http/api/types.gen.go
  • shared/management/proto/management.proto
  • shared/management/proto/proxy_service.proto
💤 Files with no reviewable changes (2)
  • shared/management/client/rest/client.go
  • shared/management/client/rest/reverse_proxy_clusters.go

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (3)
proxy/server.go (3)

930-941: ⚠️ Potential issue | 🟠 Major

Keep the live mapping until the replacement is known-good.

modifyMapping() still removes the existing routes before setupMappingRoutes() succeeds. If the new bind/dial/validation fails, the service stays down and lastMappings still points at stale state. Stage the replacement first, then swap and clean up the old mapping only after success.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@proxy/server.go` around lines 930 - 941, modifyMapping currently removes the
existing routes before attempting to set up the new mapping, causing downtime if
setupMappingRoutes fails; change the flow to first attempt to set up the new
mapping without deleting the old (call setupMappingRoutes(ctx, mapping) while
the old mapping remains active via loadMapping/lastMappings), and only after
setupMappingRoutes succeeds swap the active mapping (storeMapping(mapping)) and
then call cleanupMappingRoutes(old) to remove the previous routes; ensure on
setup error you do not alter lastMappings or remove routes so the old mapping
stays live.

720-724: ⚠️ Potential issue | 🔴 Critical

Apply PROXY wrapping to per-port listeners too.

When ProxyProtocol is enabled, the listener created here still goes straight into the port router. Any PROXY header on a custom TCP/TLS port will reach SNI parsing or the backend as payload instead of being consumed first.

🔧 Proposed fix
 	ln, err := net.Listen("tcp", listenAddr)
 	if err != nil {
 		return nil, fmt.Errorf("listen TCP on %s: %w", listenAddr, err)
 	}
+	if s.ProxyProtocol {
+		ln = s.wrapProxyProtocol(ln)
+	}
 
 	router := nbtcp.NewPortRouter(s.Logger, s.resolveDialFunc)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@proxy/server.go` around lines 720 - 724, The listener created as ln from
net.Listen (listenAddr) is not wrapped with the PROXY-protocol parser, so when
ProxyProtocol is enabled any PROXY header on custom TCP/TLS ports will be passed
to SNI/backend as payload; update the code that creates per-port listeners (the
net.Listen -> ln flow) to detect when ProxyProtocol is enabled and wrap ln with
the proxy-protocol listener wrapper before handing it to the port router
(preserving existing error handling and variable ln), e.g., replace passing raw
ln into the router with the wrapped listener when ProxyProtocol is enabled.

1304-1307: ⚠️ Potential issue | 🟠 Major

Guard L4 gauge decrements to mappings that were actually installed.

cleanupMappingRoutes() always calls L4ServiceRemoved() for L4 modes. Failed create/modify rollbacks can reach this path before L4ServiceAdded() ever ran, which drives the active-service gauge negative.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@proxy/server.go` around lines 1304 - 1307, cleanupMappingRoutes currently
unconditionally calls s.meter.L4ServiceRemoved for any L4 mode, which can
decrement the gauge for mappings that were never successfully installed; modify
cleanupMappingRoutes to only call L4ServiceRemoved when the mapping was actually
installed (e.g., check an "installed" marker or membership in the
active-L4-services set) and ensure the install path (where L4ServiceAdded is
called) sets that marker; conversely, remove/clear the marker on successful
removal so L4ServiceRemoved is only invoked for mappings that previously
triggered L4ServiceAdded (reference cleanupMappingRoutes, L4ServiceRemoved,
L4ServiceAdded, mapping.GetMode / types.ServiceMode).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@client/cmd/expose.go`:
- Around line 108-114: The current check lets an explicit "--external-port 0"
pass because the uint16 flag defaults to 0 and resolveExternalPort treats 0 as
unset; update the expose command to explicitly reject an external-port value of
0 when the flag was provided: inside the branch that inspects
cmd.Flags().Changed("external-port") (and/or before calling
resolveExternalPort), add a guard that reads the external-port value (the
exposed variable passed to resolveExternalPort) and returns an error if it
equals 0 (e.g., "explicit --external-port 0 is not allowed"); keep the existing
protocol-specific checks (isClusterProtocol, exposeProtocol, exposePin,
exposePassword, exposeUserGroups) unchanged otherwise.
- Around line 254-255: The log wrongly assumes a reason for port reassignment;
change the message emitted when PortAutoAssigned is true (the block using
PortAutoAssigned and exposeExternalPort in expose.go) to a neutral statement
that the requested port was not honored rather than claiming cluster
limitations—e.g., replace the guessy wording with something like "Note:
requested port %d was not honored; using assigned port" or similar so it doesn't
assert a cause; keep the check on PortAutoAssigned and preserve
exposeExternalPort in the message.
- Around line 76-84: extractPort currently slices on the last ':' which
misparses URLs like "tls://example.com" and returns garbage; update extractPort
to parse the input with net/url.Parse and prefer u.Port() when u.Host is
present, returning the fallback (using strconv.FormatUint(uint64(fallback), 10))
if no port is specified; for inputs that fail URL parsing (raw "host:port"
forms), fall back to net.SplitHostPort to extract the port and otherwise return
the formatted fallback. Ensure you update the logic inside extractPort to use
url.Parse, u.Port(), and net.SplitHostPort rather than strings.LastIndex
slicing.

In `@proxy/internal/udp/relay.go`:
- Around line 214-243: getOrCreateSession currently holds r.mu while calling
r.dialFunc which can block shutdown; change the flow to: under r.mu check for
existing session and whether len(r.sessions) < r.maxSessions and
sessLimiter.Allow(), if allowed then reserve a provisional slot or mark intent
and release r.mu before creating dialCtx and calling r.dialFunc (using
r.dialTimeout and dialCancel as before); after dial returns reacquire r.mu,
recheck if a session was raced-in or if the provisional slot is still valid,
then create and insert the session or undo the reservation and invoke observer
callbacks/return the appropriate error; keep observer calls and error wrapping
behavior but move them to the appropriate pre- and post-dial checks so the
network dial is performed outside r.mu while preserving correctness.

---

Duplicate comments:
In `@proxy/server.go`:
- Around line 930-941: modifyMapping currently removes the existing routes
before attempting to set up the new mapping, causing downtime if
setupMappingRoutes fails; change the flow to first attempt to set up the new
mapping without deleting the old (call setupMappingRoutes(ctx, mapping) while
the old mapping remains active via loadMapping/lastMappings), and only after
setupMappingRoutes succeeds swap the active mapping (storeMapping(mapping)) and
then call cleanupMappingRoutes(old) to remove the previous routes; ensure on
setup error you do not alter lastMappings or remove routes so the old mapping
stays live.
- Around line 720-724: The listener created as ln from net.Listen (listenAddr)
is not wrapped with the PROXY-protocol parser, so when ProxyProtocol is enabled
any PROXY header on custom TCP/TLS ports will be passed to SNI/backend as
payload; update the code that creates per-port listeners (the net.Listen -> ln
flow) to detect when ProxyProtocol is enabled and wrap ln with the
proxy-protocol listener wrapper before handing it to the port router (preserving
existing error handling and variable ln), e.g., replace passing raw ln into the
router with the wrapped listener when ProxyProtocol is enabled.
- Around line 1304-1307: cleanupMappingRoutes currently unconditionally calls
s.meter.L4ServiceRemoved for any L4 mode, which can decrement the gauge for
mappings that were never successfully installed; modify cleanupMappingRoutes to
only call L4ServiceRemoved when the mapping was actually installed (e.g., check
an "installed" marker or membership in the active-L4-services set) and ensure
the install path (where L4ServiceAdded is called) sets that marker; conversely,
remove/clear the marker on successful removal so L4ServiceRemoved is only
invoked for mappings that previously triggered L4ServiceAdded (reference
cleanupMappingRoutes, L4ServiceRemoved, L4ServiceAdded, mapping.GetMode /
types.ServiceMode).

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e87a74e2-e4a4-4c4b-b852-bd5d95b987c7

📥 Commits

Reviewing files that changed from the base of the PR and between 000ae70 and 6f94ead.

📒 Files selected for processing (7)
  • client/cmd/expose.go
  • management/internals/modules/reverseproxy/service/manager/manager.go
  • management/internals/modules/reverseproxy/service/service.go
  • proxy/internal/roundtrip/netbird.go
  • proxy/internal/udp/relay.go
  • proxy/internal/udp/relay_test.go
  • proxy/server.go

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

♻️ Duplicate comments (2)
proxy/internal/conntrack/hijacked.go (1)

27-31: ⚠️ Potential issue | 🟠 Major

Host matching is still case-sensitive.

hostOnly() fixed the port mismatch, but tc.host == host will still miss hijacked connections opened with a differently cased Host header. That can leave WebSocket sessions alive after mapping cleanup for the same domain. Please canonicalize both sides or compare case-insensitively here.

Suggested minimal fix
 import (
 	"net/http"
+	"strings"
 	"sync"
 )
@@
-		if tc.host == host {
+		if strings.EqualFold(tc.host, host) {
 			toClose = append(toClose, tc)
 		}

Also applies to: 50-55

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@proxy/internal/conntrack/hijacked.go` around lines 27 - 31, The host
comparison is case-sensitive and can miss hijacked connections; ensure the
tracked host and the incoming request host are normalized before comparison by
applying the same canonicalization (e.g., lowercasing or using a canonical host
function) to both sides: update where trackingWriter is constructed (host:
hostOnly(r.Host)) and where the comparison is done (tc.host == host) to use the
canonicalized form (e.g., canonicalHost(hostOnly(...)) or strings.ToLower on
both tc.host and host) so matches are case-insensitive; apply the same change at
the other occurrence referenced (lines 50-55) to keep behavior consistent.
management/internals/modules/reverseproxy/service/manager/manager.go (1)

344-348: ⚠️ Potential issue | 🟠 Major

Potential infinite loop when autoAssignPortMax is 65535.

If autoAssignPortMax equals 65535 (the maximum uint16 value), the linear scan loop will overflow: after processing port 65535, port++ wraps to 0, and 0 <= 65535 is true, causing an infinite loop.

The past review comment suggested using int for the loop variable to avoid this, but it appears the fix wasn't fully applied to this block.

🐛 Suggested fix
-	for port := autoAssignPortMin; port <= autoAssignPortMax; port++ {
-		if _, taken := occupied[port]; !taken {
-			return port, nil
+	for p := int(autoAssignPortMin); p <= int(autoAssignPortMax); p++ {
+		port := uint16(p)
+		if _, taken := occupied[port]; !taken {
+			return port, nil
 		}
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/internals/modules/reverseproxy/service/manager/manager.go` around
lines 344 - 348, The for-loop that scans ports using "for port :=
autoAssignPortMin; port <= autoAssignPortMax; port++" can overflow when
autoAssignPortMax is 65535; change the loop so the loop counter is an int (e.g.,
use "for p := int(autoAssignPortMin); p <= int(autoAssignPortMax); p++") and
compare/cast back to the uint16 when checking the occupied map (use uint16(p) as
the key), ensuring "autoAssignPortMin", "autoAssignPortMax", and "occupied" are
referenced accordingly to avoid wraparound/infinite looping.
🧹 Nitpick comments (3)
management/internals/shared/grpc/proxy_test.go (2)

56-58: Make the fake controller capability-aware.

ClusterSupportsCustomPorts currently returns true for every cluster, while registerFakeProxyWithCaps only records caps on the connection. That means a legacy-only cluster is indistinguishable from a custom-port-capable one, so capability-sensitive tests can pass for the wrong reason.

Possible test-double fix
 type testProxyController struct {
 	mu             sync.Mutex
 	clusterProxies map[string]map[string]struct{}
+	proxyCaps      map[string]*proto.ProxyCapabilities
 }

 func newTestProxyController() *testProxyController {
 	return &testProxyController{
 		clusterProxies: make(map[string]map[string]struct{}),
+		proxyCaps:      make(map[string]*proto.ProxyCapabilities),
 	}
 }

-func (c *testProxyController) ClusterSupportsCustomPorts(_ string) bool {
-	return true
+func (c *testProxyController) ClusterSupportsCustomPorts(clusterAddr string) bool {
+	c.mu.Lock()
+	defer c.mu.Unlock()
+
+	for proxyID := range c.clusterProxies[clusterAddr] {
+		if caps := c.proxyCaps[proxyID]; caps != nil && caps.SupportsCustomPorts {
+			return true
+		}
+	}
+	return false
 }

and persist caps from registerFakeProxyWithCaps into testProxyController.proxyCaps.

Based on learnings: ClusterSupportsCustomPorts intentionally returns true if ANY proxy in the cluster reports SupportsCustomPorts.

Also applies to: 81-91

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/internals/shared/grpc/proxy_test.go` around lines 56 - 58,
testProxyController.ClusterSupportsCustomPorts currently always returns true;
instead have it consult the recorded capabilities stored on the test controller
(persist the caps provided to registerFakeProxyWithCaps into
testProxyController.proxyCaps) and return true only if any proxy for the given
cluster reports SupportsCustomPorts. Update registerFakeProxyWithCaps to save
the caps into testProxyController.proxyCaps keyed by cluster ID and modify
ClusterSupportsCustomPorts(clusterID string) bool to look up that key and return
whether any saved cap has SupportsCustomPorts set.

320-371: Add a real filtered-out case for custom-port L4 mappings.

This test only proves that HTTP and TLS updates are not filtered. It never exercises the negative path where a legacy proxy must not receive a custom-port TCP/UDP mapping, so a regression in the capability gate would still pass.

Coverage to add
 	// Send an HTTP mapping: both should receive it.
 	httpMapping := &proto.ProxyMapping{
 		Type:      proto.ProxyMappingUpdateType_UPDATE_TYPE_CREATED,
 		Id:        "service-http",
 		AccountId: "account-1",
 		Domain:    "app.example.com",
 		Path:      []*proto.PathMapping{{Path: "/", Target: "http://10.0.0.1:80"}},
 	}

 	s.SendServiceUpdateToCluster(ctx, httpMapping, cluster)

 	msgA = drainMapping(chA)
 	msgB = drainMapping(chB)
 	assert.NotNil(t, msgA, "proxy-a should receive HTTP mapping")
 	assert.NotNil(t, msgB, "proxy-b should receive HTTP mapping")
+
+	// Custom-port TCP should be filtered from legacy proxies.
+	tcpMapping := &proto.ProxyMapping{
+		Type:       proto.ProxyMappingUpdateType_UPDATE_TYPE_CREATED,
+		Id:         "service-tcp",
+		AccountId:  "account-1",
+		Mode:       "tcp",
+		ListenPort: 15432,
+		Path:       []*proto.PathMapping{{Target: "10.0.0.5:5432"}},
+	}
+
+	s.SendServiceUpdateToCluster(ctx, tcpMapping, cluster)
+
+	msgA = drainMapping(chA)
+	require.NotNil(t, msgA, "proxy-a should receive TCP mapping")
+	assert.True(t, drainEmpty(chB), "proxy-b should not receive custom-port TCP mapping")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/internals/shared/grpc/proxy_test.go` around lines 320 - 371, The
test TestSendServiceUpdateToCluster_FiltersOnCapability is missing a negative
case for custom-port L4 mappings: add a created ProxyMapping that represents a
TCP/UDP mapping with a non-standard ListenPort (e.g. Type UPDATE_TYPE_CREATED,
Mode "tcp", ListenPort set, and Path target like "10.0.0.5:12345"), call
s.SendServiceUpdateToCluster(ctx, customPortMapping, cluster), then
drainMapping(chA) and drainMapping(chB) and assert that the proxy with
SupportsCustomPorts=true (proxy-a) receives the mapping (assert.NotNil on msgA)
while the legacy/shared proxy with SupportsCustomPorts=false (proxy-b) does not
receive it (assert.Nil on msgB); reuse registerFakeProxyWithCaps,
SendServiceUpdateToCluster, and drainMapping from the test to implement this.
management/internals/modules/reverseproxy/service/manager/manager.go (1)

43-52: Consider rejecting port 0 as a minimum bound.

The validation resets to defaults if min > max, but doesn't check if autoAssignPortMin == 0. Port 0 is typically reserved and shouldn't be auto-assigned. However, since portFromEnv uses strconv.ParseUint which would parse "0" as valid, this edge case could occur.

🛡️ Optional enhancement
 func init() {
 	autoAssignPortMin = portFromEnv(EnvAutoAssignPortMin, defaultAutoAssignPortMin)
 	autoAssignPortMax = portFromEnv(EnvAutoAssignPortMax, defaultAutoAssignPortMax)
-	if autoAssignPortMin > autoAssignPortMax {
+	if autoAssignPortMin == 0 || autoAssignPortMin > autoAssignPortMax {
 		log.Warnf("port range invalid: %s (%d) > %s (%d), using defaults",
 			EnvAutoAssignPortMin, autoAssignPortMin, EnvAutoAssignPortMax, autoAssignPortMax)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/internals/modules/reverseproxy/service/manager/manager.go` around
lines 43 - 52, The init() port-range validation misses the case where
autoAssignPortMin == 0 (port 0 should not be used); update the init() logic
(which calls portFromEnv and sets autoAssignPortMin/autoAssignPortMax) to treat
a parsed min of 0 as invalid: log a warning referencing EnvAutoAssignPortMin and
reset autoAssignPortMin to defaultAutoAssignPortMin (and similarly ensure
autoAssignPortMax is valid), or perform a combined check (if autoAssignPortMin
== 0 || autoAssignPortMin > autoAssignPortMax) then reset to defaults and warn.
Ensure the log message mentions the zero-value and EnvAutoAssignPortMin to aid
debugging.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@management/internals/modules/reverseproxy/service/service.go`:
- Around line 590-592: The isAuthEnabled() helper is too permissive because it
returns true for non-nil auth structs even when their Enabled flag is false;
update isAuthEnabled() to only return true if an auth config is non-nil AND its
Enabled boolean is true (check PasswordAuthConfig.Enabled,
JWKSAuthConfig.Enabled, etc. on Auth.PasswordAuth and Auth.JWKS), and keep the
existing validation locations (the checks in service.go that call
s.isAuthEnabled() before rejecting L4 TCP/UDP services) so they correctly reject
only truly enabled auth configurations.
- Line 498: The code silently truncates out-of-range ports when converting
apiTarget.Port to uint16 (the conversion at the Port: uint16(apiTarget.Port)
site); add explicit validation before that conversion to reject values <1 or
>65535 and return a clear error (e.g., include the target index i and the
invalid apiTarget.Port in the error message). Update the code path that
constructs the target (the spot that reads apiTarget.Port and uses uint16(...))
to perform the range check and only perform uint16() after validation;
alternatively, add minimum/maximum constraints to the OpenAPI schema for the
port field to enforce 1–65535. Ensure the error path prevents constructing the
target when the port is invalid.

In `@proxy/internal/udp/relay.go`:
- Around line 420-432: The Close() loop iterates r.sessions and dereferences
possible nil session placeholders (accessing sess.addr, sess.bytesIn.Load(),
sess.bytesOut.Load()) which can panic; update the loop in relay.go to skip nil
sess entries by checking "if sess == nil { delete(r.sessions, key); continue }"
before logging or calling sess.cancel/ sess.backend.Close(), and only increment
the closed counter and notify observers for non-nil, actually-closed sessions;
keep existing Debugf and backend.Close() calls but guard them behind the nil
check and still remove the map entry.
- Around line 358-385: The cleanupIdleSessions function can panic when
r.sessions contains nil placeholders (set by getOrCreateSession during dial);
update Relay.cleanupIdleSessions to skip nil sess entries before calling
sess.idleDuration() and sess.cancel()/sess.backend.Close(); also fix the
observer notification loop by iterating cleaned times (e.g., a simple for i :=
0; i < cleaned; i++) and call r.observer.UDPSessionEnded(string(r.accountID))
that many times so notifications match closed sessions. Ensure you reference
r.sessions, Relay.cleanupIdleSessions, getOrCreateSession, sess.idleDuration,
sess.cancel, and sess.backend.Close when making the changes.
- Around line 276-278: The call r.sessWg.Go(...) is invalid because
sync.WaitGroup has no Go method; replace it by incrementing the wait group and
launching a goroutine that calls relayBackendToClient and signals Done when
finished — e.g. call r.sessWg.Add(1) immediately before a goroutine: go func() {
defer r.sessWg.Done(); r.relayBackendToClient(sessCtx, sess) }(); alternatively,
if you need error propagation use an errgroup.Group and run g.Go(func() error {
r.relayBackendToClient(sessCtx, sess); return nil }) referencing r.sessWg,
relayBackendToClient, sessCtx, and sess accordingly.

In `@proxy/server.go`:
- Around line 922-926: The setup flow can leave ACME/SNI/host-route state
registered by setupHTTPMapping() when updateMapping() fails; after
s.setupMappingRoutes(ctx, mapping) returns an error you must also undo any
mapping routes/certs to avoid stale entries — call the inverse cleanup (e.g.,
s.teardownMappingRoutes or the cleanup used by delete logic) for the same
mapping, and only register HTTP/ACME/SNI state after updateMapping() has
succeeded; update the error path in the block handling s.setupMappingRoutes
failure to invoke the mapping-route cleanup (in addition to
s.netbird.RemovePeer) or move the HTTP route registration to after
updateMapping() returns successfully, referencing setupHTTPMapping,
updateMapping, s.setupMappingRoutes, and s.netbird.RemovePeer to locate the
relevant code.
- Around line 1210-1215: The code uses udpRelayWg.Go(relay.Serve) but udpRelayWg
is a sync.WaitGroup which has no Go method; replace this call by incrementing
the WaitGroup with udpRelayWg.Add(1) and launching relay.Serve in a new
goroutine that defers udpRelayWg.Done() so the wait group is properly balanced;
update the block around udpRelayWg and relay.Serve (referencing udpRelayWg,
relay.Serve, and s.udpRelays) accordingly to ensure the relay runs concurrently
and the WaitGroup is decremented when Serve returns.
- Around line 935-949: modifyMapping currently assumes the mapping's service key
doesn't change; compute oldKey := serviceKeyForMapping(old) (when old :=
s.loadMapping(mapping.GetId()) != nil) and newKey :=
serviceKeyForMapping(mapping), and if they differ remove the old peer
registration before setting up the new routes (e.g., call the existing peer
removal logic that removeMapping/removePeer uses or add a helper to unregister
by key), then proceed with
s.cleanupMappingRoutes(old)/s.setupMappingRoutes(mapping) and finally
s.storeMapping(mapping); ensure any service-mode meter adjustments
(meter.L4ServiceRemoved) still run for the old mapping.

In `@shared/management/http/api/openapi.yml`:
- Around line 4275-4277: The 409 "conflict" response component is empty so
generated clients don't get the collision message; update the "conflict"
component in openapi.yml to include a body by setting its content to
application/json with a schema that $ref's the existing ErrorResponse (e.g.
content: application/json -> schema: $ref:
'#/components/schemas/ErrorResponse'), and ensure ErrorResponse is defined and
used by the create/update endpoints that advertise 409.
- Around line 2946-2949: The OpenAPI schema for listen_port lacks numeric bounds
causing values like -1 or 70000 to pass validation and later be cast to uint16
in reverseproxy/service/service.go; update the listen_port property in
openapi.yml (and the other occurrence mentioned around lines 3023-3025) to
include minimum: 0 and maximum: 65535 (uint16 range) so the contract rejects
out-of-range values before the cast in the code (refer to req.ListenPort and the
cast to uint16 in service.go).

---

Duplicate comments:
In `@management/internals/modules/reverseproxy/service/manager/manager.go`:
- Around line 344-348: The for-loop that scans ports using "for port :=
autoAssignPortMin; port <= autoAssignPortMax; port++" can overflow when
autoAssignPortMax is 65535; change the loop so the loop counter is an int (e.g.,
use "for p := int(autoAssignPortMin); p <= int(autoAssignPortMax); p++") and
compare/cast back to the uint16 when checking the occupied map (use uint16(p) as
the key), ensuring "autoAssignPortMin", "autoAssignPortMax", and "occupied" are
referenced accordingly to avoid wraparound/infinite looping.

In `@proxy/internal/conntrack/hijacked.go`:
- Around line 27-31: The host comparison is case-sensitive and can miss hijacked
connections; ensure the tracked host and the incoming request host are
normalized before comparison by applying the same canonicalization (e.g.,
lowercasing or using a canonical host function) to both sides: update where
trackingWriter is constructed (host: hostOnly(r.Host)) and where the comparison
is done (tc.host == host) to use the canonicalized form (e.g.,
canonicalHost(hostOnly(...)) or strings.ToLower on both tc.host and host) so
matches are case-insensitive; apply the same change at the other occurrence
referenced (lines 50-55) to keep behavior consistent.

---

Nitpick comments:
In `@management/internals/modules/reverseproxy/service/manager/manager.go`:
- Around line 43-52: The init() port-range validation misses the case where
autoAssignPortMin == 0 (port 0 should not be used); update the init() logic
(which calls portFromEnv and sets autoAssignPortMin/autoAssignPortMax) to treat
a parsed min of 0 as invalid: log a warning referencing EnvAutoAssignPortMin and
reset autoAssignPortMin to defaultAutoAssignPortMin (and similarly ensure
autoAssignPortMax is valid), or perform a combined check (if autoAssignPortMin
== 0 || autoAssignPortMin > autoAssignPortMax) then reset to defaults and warn.
Ensure the log message mentions the zero-value and EnvAutoAssignPortMin to aid
debugging.

In `@management/internals/shared/grpc/proxy_test.go`:
- Around line 56-58: testProxyController.ClusterSupportsCustomPorts currently
always returns true; instead have it consult the recorded capabilities stored on
the test controller (persist the caps provided to registerFakeProxyWithCaps into
testProxyController.proxyCaps) and return true only if any proxy for the given
cluster reports SupportsCustomPorts. Update registerFakeProxyWithCaps to save
the caps into testProxyController.proxyCaps keyed by cluster ID and modify
ClusterSupportsCustomPorts(clusterID string) bool to look up that key and return
whether any saved cap has SupportsCustomPorts set.
- Around line 320-371: The test
TestSendServiceUpdateToCluster_FiltersOnCapability is missing a negative case
for custom-port L4 mappings: add a created ProxyMapping that represents a
TCP/UDP mapping with a non-standard ListenPort (e.g. Type UPDATE_TYPE_CREATED,
Mode "tcp", ListenPort set, and Path target like "10.0.0.5:12345"), call
s.SendServiceUpdateToCluster(ctx, customPortMapping, cluster), then
drainMapping(chA) and drainMapping(chB) and assert that the proxy with
SupportsCustomPorts=true (proxy-a) receives the mapping (assert.NotNil on msgA)
while the legacy/shared proxy with SupportsCustomPorts=false (proxy-b) does not
receive it (assert.Nil on msgB); reuse registerFakeProxyWithCaps,
SendServiceUpdateToCluster, and drainMapping from the test to implement this.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: bea41b64-c684-4a3c-8bfc-1cf1ae17b4bd

📥 Commits

Reviewing files that changed from the base of the PR and between 6f94ead and 5d59da8.

⛔ Files ignored due to path filters (1)
  • shared/management/proto/proxy_service.pb.go is excluded by !**/*.pb.go
📒 Files selected for processing (15)
  • client/cmd/expose.go
  • client/internal/expose/manager_test.go
  • management/internals/modules/reverseproxy/service/manager/manager.go
  • management/internals/modules/reverseproxy/service/service.go
  • management/internals/shared/grpc/expose_service.go
  • management/internals/shared/grpc/proxy_test.go
  • proxy/cmd/proxy/cmd/root.go
  • proxy/internal/conntrack/hijacked.go
  • proxy/internal/debug/client.go
  • proxy/internal/metrics/metrics.go
  • proxy/internal/tcp/proxyprotocol_test.go
  • proxy/internal/udp/relay.go
  • proxy/server.go
  • shared/management/http/api/openapi.yml
  • shared/management/proto/proxy_service.proto
🚧 Files skipped from review as they are similar to previous changes (3)
  • proxy/cmd/proxy/cmd/root.go
  • shared/management/proto/proxy_service.proto
  • proxy/internal/tcp/proxyprotocol_test.go

@lixmal lixmal changed the title [management,proxy] Add L4 capabilities (TLS/TCP/UDP) [management,proxy,client] Add L4 capabilities (TLS/TCP/UDP) Mar 7, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (2)
management/internals/shared/grpc/expose_service.go (1)

42-57: ⚠️ Potential issue | 🟠 Major

Reject negative ports before the uint16 cast.

The new checks only enforce the upper bound. A negative protobuf value still passes and wraps on cast (-1 becomes 65535), so the created service can bind a completely different port than the caller requested.

🛡️ Proposed fix
-	if exposeReq.Port > 65535 {
+	if exposeReq.Port < 0 || exposeReq.Port > 65535 {
 		return nil, status.Errorf(codes.InvalidArgument, "port out of range: %d", exposeReq.Port)
 	}
-	if exposeReq.ListenPort > 65535 {
+	if exposeReq.ListenPort < 0 || exposeReq.ListenPort > 65535 {
 		return nil, status.Errorf(codes.InvalidArgument, "listen_port out of range: %d", exposeReq.ListenPort)
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/internals/shared/grpc/expose_service.go` around lines 42 - 57, The
code currently only checks the upper bound of exposeReq.Port and
exposeReq.ListenPort before casting to uint16 in
reverseProxyMgr.CreateServiceFromPeer, allowing negative protobuf values to
wrap; update the validation in expose_service.go (the checks surrounding
exposeReq.Port and exposeReq.ListenPort) to reject values < 0 or > 65535 and
return status.Errorf(codes.InvalidArgument, ...) for either out-of-range case so
that the uint16 casts for Port and ListenPort are safe.
management/internals/modules/reverseproxy/service/manager/manager.go (1)

43-64: ⚠️ Potential issue | 🔴 Critical

The auto-assign range is still unsafe at 0 and 65535.

portFromEnv() still accepts NB_PROXY_PORT_MIN=0, and the fallback scan on Lines 344-348 increments a uint16. If NB_PROXY_PORT_MAX=65535 and 65535 is taken, it wraps to 0 and never terminates. Please reject 0 and do the range math/scan in int.

Read-only verification script
#!/bin/bash
set -euo pipefail

sed -n '43,64p;336,350p' management/internals/modules/reverseproxy/service/manager/manager.go

python - <<'PY'
port = 65535
for i in range(4):
    print(f"iteration {i}: {port}")
    port = (port + 1) & 0xFFFF
PY

Expected: the file excerpt shows no min == 0 guard, and the Python snippet prints 65535 followed by 0, which is the same wraparound the current uint16 scan relies on.

Also applies to: 336-350

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/internals/modules/reverseproxy/service/manager/manager.go` around
lines 43 - 64, The init/portFromEnv logic allows 0 and 65535 which causes uint16
wraparound during scanning; update portFromEnv to treat 0 as invalid (return
fallback and log), ensure init enforces autoAssignPortMin >= 1 and
autoAssignPortMax <= 65534 (or sensible non-edge limits), and change the
scanning/math code that iterates ports (the scan using
autoAssignPortMin/autoAssignPortMax and the loop around it) to use int for
arithmetic and bounds checks so incrementing past 65535 doesn't wrap to 0;
reference the functions/vars portFromEnv, init, autoAssignPortMin,
autoAssignPortMax and the port-scan loop in manager.go when making these
changes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@management/internals/modules/reverseproxy/service/manager/manager.go`:
- Around line 302-317: The current read-then-insert check in manager.go (using
transaction.GetServicesByClusterAndPort with LockingStrengthUpdate and
subsequent CreateService) is racy for a previously-unused port; add a
DB-enforced uniqueness constraint or a cluster-scoped lock to make acceptance
atomic: either (A) add a unique index on (proxy_cluster, mode, listen_port) and
handle unique-violation errors when CreateService is called, or (B) acquire a
cluster-level mutex/lock (e.g. AcquireClusterLock or similar) around the
conflict check and CreateService so only one transaction can claim the port;
update the code paths around GetServicesByClusterAndPort and CreateService to
rely on the chosen mechanism and surface a proper AlreadyExists error on
violation.

In `@management/internals/shared/grpc/expose_service.go`:
- Around line 224-227: The switch that converts
proto.ExposeProtocol_EXPOSE_HTTPS to the string "https" causes
ExposeServiceRequest.Validate() to reject creates; change the mapping so
EXPOSE_HTTPS normalizes to the HTTP frontend mode (return "http") instead of
"https". Update the conversion logic around the function producing Service.Mode
(the branch handling proto.ExposeProtocol_EXPOSE_HTTPS) and ensure backend
scheme remains represented on Target.Protocol if needed (do not conflate
frontend Service.Mode with backend Target.Protocol). Run or adjust related tests
that assert expose protocol normalization.

In `@management/server/store/sql_store.go`:
- Around line 5002-5006: Update the SQL predicate in
SqlStore.GetHTTPServiceByDomain to include legacy NULL modes so existing
services without a mode are matched; change the WHERE clause used in the DB call
(the chain starting with s.db.Preload("Targets").Where(...).First(&service)) to
check for mode IS NULL OR mode = '' OR mode = 'http' (i.e., include IS NULL
alongside the empty string and "http") so legacy NULL-mode services are returned
by GetHTTPServiceByDomain.

In `@management/server/store/store.go`:
- Around line 447-452: Dropping idx_services_domain removes DB-level protection
for HTTP host uniqueness; restore a DB-enforced HTTP-only uniqueness (or
implement a serialized claim) instead of plain read in SqlStore. Reintroduce a
unique index scoped to HTTP services (e.g. a partial/filtered unique index on
domain WHERE protocol='http' or a composite unique index keyed by (domain,
protocol) so L4 services can still share domain with different ports) rather
than removing it in migration.DropIndex[rpservice.Service]; alternatively
implement a transactional/locked check in SqlStore.GetHTTPServiceByDomain()
(used by checkDomainAvailable()) so concurrent creates/updates cannot both
observe the host as free. Ensure the change targets the migration that calls
migration.DropIndex and the SqlStore.GetHTTPServiceByDomain/checkDomainAvailable
code paths.

---

Duplicate comments:
In `@management/internals/modules/reverseproxy/service/manager/manager.go`:
- Around line 43-64: The init/portFromEnv logic allows 0 and 65535 which causes
uint16 wraparound during scanning; update portFromEnv to treat 0 as invalid
(return fallback and log), ensure init enforces autoAssignPortMin >= 1 and
autoAssignPortMax <= 65534 (or sensible non-edge limits), and change the
scanning/math code that iterates ports (the scan using
autoAssignPortMin/autoAssignPortMax and the loop around it) to use int for
arithmetic and bounds checks so incrementing past 65535 doesn't wrap to 0;
reference the functions/vars portFromEnv, init, autoAssignPortMin,
autoAssignPortMax and the port-scan loop in manager.go when making these
changes.

In `@management/internals/shared/grpc/expose_service.go`:
- Around line 42-57: The code currently only checks the upper bound of
exposeReq.Port and exposeReq.ListenPort before casting to uint16 in
reverseProxyMgr.CreateServiceFromPeer, allowing negative protobuf values to
wrap; update the validation in expose_service.go (the checks surrounding
exposeReq.Port and exposeReq.ListenPort) to reject values < 0 or > 65535 and
return status.Errorf(codes.InvalidArgument, ...) for either out-of-range case so
that the uint16 casts for Port and ListenPort are safe.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d499d212-6486-4661-8220-805252e4d708

📥 Commits

Reviewing files that changed from the base of the PR and between 5d59da8 and 6936322.

📒 Files selected for processing (8)
  • management/internals/modules/reverseproxy/service/manager/expose_tracker_test.go
  • management/internals/modules/reverseproxy/service/manager/manager.go
  • management/internals/modules/reverseproxy/service/manager/manager_test.go
  • management/internals/shared/grpc/expose_service.go
  • management/internals/shared/grpc/proxy.go
  • management/server/store/sql_store.go
  • management/server/store/store.go
  • management/server/store/store_mock.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • management/internals/modules/reverseproxy/service/manager/expose_tracker_test.go

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
proxy/server.go (1)

270-294: ⚠️ Potential issue | 🟠 Major

Cancel the internal run context before these early returns.

newManagementMappingWorker() is launched on the caller's ctx, but these exits only call shutdownServices(). If the caller doesn't cancel ctx, the worker keeps retrying against the closed gRPC connection after ListenAndServe() has already returned.

💡 Suggested fix
 func (s *Server) ListenAndServe(ctx context.Context, addr string) (err error) {
 	s.initDefaults()
+	runCtx, cancel := context.WithCancel(ctx)
+	defer cancel()
@@
-	go s.newManagementMappingWorker(ctx, s.mgmtClient)
+	go s.newManagementMappingWorker(runCtx, s.mgmtClient)
@@
-	tlsConfig, err := s.configureTLS(ctx)
+	tlsConfig, err := s.configureTLS(runCtx)
@@
-		routerErr <- s.mainRouter.Serve(ctx, ln)
+		routerErr <- s.mainRouter.Serve(runCtx, ln)
@@
 	case err := <-httpsErr:
+		cancel()
 		s.shutdownServices()
@@
 	case err := <-routerErr:
+		cancel()
 		s.shutdownServices()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@proxy/server.go` around lines 270 - 294, Early returns from the goroutine
select do not cancel the internal run context used by
newManagementMappingWorker, so the worker keeps retrying after ListenAndServe
returns; before calling s.shutdownServices() or returning in both select
branches, invoke the cancel function for the ctx used to launch
newManagementMappingWorker (the same cancel for the context passed into
s.mainRouter.Serve), ensuring you reference the internal run context/cancel
associated with newManagementMappingWorker and call it prior to
s.shutdownServices() and before each return in the httpsErr and routerErr cases.
♻️ Duplicate comments (1)
proxy/internal/udp/relay.go (1)

247-278: ⚠️ Potential issue | 🔴 Critical

Re-check relay state after the backend dial completes.

Close() can cancel the relay and delete the nil placeholder while r.dialFunc(...) is still in flight. This path then unconditionally reinserts the session and starts sessWg.Go(...), which can race with Close() already waiting and leave a late session behind during shutdown.

💡 Suggested fix
 	backend, err := r.dialFunc(dialCtx, "udp", r.target)
 	dialCancel()
 	if err != nil {
 		r.mu.Lock()
 		delete(r.sessions, key)
@@
 	sess.updateLastSeen()

 	r.mu.Lock()
+	current, ok := r.sessions[key]
+	if r.ctx.Err() != nil || !ok || current != nil {
+		r.mu.Unlock()
+		sessCancel()
+		_ = backend.Close()
+		if current != nil {
+			return current, nil
+		}
+		return nil, r.ctx.Err()
+	}
 	r.sessions[key] = sess
 	r.mu.Unlock()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@proxy/internal/udp/relay.go` around lines 247 - 278, After dialFunc returns,
re-check the relay state under the mutex before inserting and starting the
session: acquire r.mu, verify that r.sessions[key] still either has the
placeholder or no entry (i.e., Close didn't remove it), and only then set
r.sessions[key]=sess and release the lock; if the entry was removed or replaced,
cleanly close the dialed backend, call sess.cancel(), do not start sessWg.Go,
and return an error. Use the existing symbols dialFunc, r.sessions, key,
session, sess.cancel, backend, sessWg.Go, and relayBackendToClient to locate and
implement this guard. Ensure observer notifications
(UDPSessionStarted/UDPSessionDialError) are only invoked when the session is
actually stored and the goroutine will be launched.
🧹 Nitpick comments (1)
management/internals/shared/grpc/proxy_test.go (1)

311-362: Test name is misleading — consider renaming for clarity.

The test name TestSendServiceUpdateToCluster_FiltersOnCapability suggests capability-based filtering, but the test actually verifies that no filtering occurs: both TLS and HTTP mappings reach all proxies regardless of SupportsCustomPorts. A clearer name would reflect the actual behavior.

Suggested rename
-func TestSendServiceUpdateToCluster_FiltersOnCapability(t *testing.T) {
+func TestSendServiceUpdateToCluster_NoFilteringForTLSAndHTTP(t *testing.T) {

Or alternatively: TestSendServiceUpdateToCluster_DeliversToAllProxies

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@management/internals/shared/grpc/proxy_test.go` around lines 311 - 362, The
test name TestSendServiceUpdateToCluster_FiltersOnCapability is misleading
because the assertions verify that mappings are delivered to all proxies (no
capability-based filtering); rename the test function to
TestSendServiceUpdateToCluster_DeliversToAllProxies (and update any in-file
comments mentioning the old name) so the name matches behavior, ensuring you
change the function identifier and any references to it in this file.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@management/internals/modules/reverseproxy/service/service.go`:
- Line 877: The Meta struct is being shallow-copied (Meta: s.Meta) which shares
pointer fields; instead perform a defensive deep copy of s.Meta before
assigning: create a local copy newMeta := s.Meta and if
newMeta.CertificateIssuedAt or newMeta.LastRenewedAt are non-nil allocate new
time.Time values and copy the times (e.g. tmp := *newMeta.CertificateIssuedAt;
newMeta.CertificateIssuedAt = &tmp), then assign Meta: newMeta when constructing
the new service. This preserves the original s.Meta while avoiding shared
pointers for CertificateIssuedAt and LastRenewedAt.

In `@proxy/internal/udp/relay.go`:
- Around line 177-185: The failed sess.backend.Write currently only returns the
buffer and logs, but leaves the dead session in r.sessions while
sess.updateLastSeen continues to keep it alive; modify the error branch after
sess.backend.Write to fully tear down the session: close/terminate sess (e.g.,
call its Close/Shutdown method or close sess.backend), remove the sess entry
from r.sessions (using the same map/lock mechanism used elsewhere to mutate
r.sessions), ensure r.bufPool.Put(bufp) is still called, and avoid updating
last-seen after teardown so future packets will create a new backend instead of
reusing the broken sess.

---

Outside diff comments:
In `@proxy/server.go`:
- Around line 270-294: Early returns from the goroutine select do not cancel the
internal run context used by newManagementMappingWorker, so the worker keeps
retrying after ListenAndServe returns; before calling s.shutdownServices() or
returning in both select branches, invoke the cancel function for the ctx used
to launch newManagementMappingWorker (the same cancel for the context passed
into s.mainRouter.Serve), ensuring you reference the internal run context/cancel
associated with newManagementMappingWorker and call it prior to
s.shutdownServices() and before each return in the httpsErr and routerErr cases.

---

Duplicate comments:
In `@proxy/internal/udp/relay.go`:
- Around line 247-278: After dialFunc returns, re-check the relay state under
the mutex before inserting and starting the session: acquire r.mu, verify that
r.sessions[key] still either has the placeholder or no entry (i.e., Close didn't
remove it), and only then set r.sessions[key]=sess and release the lock; if the
entry was removed or replaced, cleanly close the dialed backend, call
sess.cancel(), do not start sessWg.Go, and return an error. Use the existing
symbols dialFunc, r.sessions, key, session, sess.cancel, backend, sessWg.Go, and
relayBackendToClient to locate and implement this guard. Ensure observer
notifications (UDPSessionStarted/UDPSessionDialError) are only invoked when the
session is actually stored and the goroutine will be launched.

---

Nitpick comments:
In `@management/internals/shared/grpc/proxy_test.go`:
- Around line 311-362: The test name
TestSendServiceUpdateToCluster_FiltersOnCapability is misleading because the
assertions verify that mappings are delivered to all proxies (no
capability-based filtering); rename the test function to
TestSendServiceUpdateToCluster_DeliversToAllProxies (and update any in-file
comments mentioning the old name) so the name matches behavior, ensuring you
change the function identifier and any references to it in this file.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: fcab214b-9d33-4090-9ad4-f346fa5c8b8f

📥 Commits

Reviewing files that changed from the base of the PR and between 6936322 and 8652a43.

📒 Files selected for processing (5)
  • management/internals/modules/reverseproxy/service/service.go
  • management/internals/shared/grpc/proxy_test.go
  • proxy/internal/udp/relay.go
  • proxy/server.go
  • shared/management/http/api/openapi.yml

@lixmal lixmal force-pushed the reverse-proxy-l4 branch from e95e0d4 to 151247e Compare March 9, 2026 16:38
@lixmal lixmal force-pushed the reverse-proxy-l4 branch from c865ebc to a191723 Compare March 11, 2026 07:17
@sonarqubecloud
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants