-
Notifications
You must be signed in to change notification settings - Fork 33
fix: handle mongodb connection pool management effectively and optimally #667
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
📝 WalkthroughWalkthroughAdds MongoDB app-name and connection-pool configuration exposure across Helm charts and environment loaders; wires those values into MongoDB and PostgreSQL client options; introduces proactive disconnect/cleanup on MongoDB initialization and shutdown; updates adapters, tests, and related config structs. Changes
Sequence Diagram(s)sequenceDiagram
participant Helm as Helm / Env
participant Loader as Config Loader
participant Watcher as ChangeStreamWatcher
participant Client as MongoDB Client
participant DB as MongoDB Server
Helm->>Loader: expose MONGODB_APP_NAME, MAX_POOL, MIN_POOL, IDLE_SEC
Loader->>Watcher: provide MongoDBConfig (AppName, MaxPool, MinPool, MaxConnIdleTime)
Watcher->>Client: construct options (SetAppName, SetMaxPoolSize, SetMinPoolSize, SetMaxConnIdleTime)
Client->>DB: Connect()
DB-->>Client: OK / Error
alt success
Client-->>Watcher: ready
else error
Watcher->>Client: disconnectOnError -> Disconnect()
Client-->>Watcher: disconnected
end
Note over Watcher,Client: On shutdown
Watcher->>Client: Close -> Disconnect()
Client-->>Watcher: disconnected (logged)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.5.0)level=error msg="[linters_context] typechecking error: pattern ./...: directory prefix . does not contain main module or its selected dependencies" Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
distros/kubernetes/nvsentinel/templates/daemonset.yaml (1)
23-29: Add nodeAffinity to separate DaemonSet deployments for kata and regular nodes, and configure appropriate volume mounts for logging.This template currently has no affinity configuration to distinguish between kata and regular nodes. According to the coding guidelines, separate DaemonSets (or a single template with nodeAffinity) should be created using
nodeAffinitybased on the kata.enabled label.Additionally, the volume mounts are incomplete:
- Regular node DaemonSets require
/var/logvolume mount for file-based logs- Kata node DaemonSets require
/run/log/journaland/var/log/journalvolume mounts for systemd journalWhile the values.yaml is configured to detect kata nodes, the daemonset template needs to be enhanced with conditional affinity rules and appropriate volume mounts based on the kata detection result.
🧹 Nitpick comments (7)
store-client/pkg/config/env_loader.go (1)
93-97: Consider consolidating duplicate default constants.These default constants are also defined in
store-client/pkg/datastore/providers/mongodb/watcher/watch_store.go(Lines 49-50). Having duplicate definitions could lead to drift if one is updated without the other.Consider exporting these constants from a single location (e.g., this file) and importing them in
watch_store.goto ensure consistency:-// Connection pool defaults -DefaultMaxPoolSize = 3 -DefaultMinPoolSize = 1 -DefaultMaxConnIdleTimeSeconds = 300 // 5 minutes +// Connection pool defaults - exported for use by other packages +const ( + DefaultMaxPoolSize = 3 + DefaultMinPoolSize = 1 + DefaultMaxConnIdleTimeSeconds = 300 // 5 minutes +)Then in
watch_store.go, import from this package instead of defining locally.store-client/pkg/datastore/providers/mongodb/watcher/watch_store_test.go (1)
1845-1873: Strengthen panic prevention verification.The test verifies that
Close()handles a nil client, but it doesn't explicitly assert that no panic occurs. Consider wrapping the Close call in a defer/recover block or usingrequire.NotPanics()to make the test's intent clearer.✨ Proposed enhancement
mt.Run("Close handles nil client gracefully", func(mt *mtest.T) { mt.AddMockResponses( mtest.CreateCursorResponse(1, "testdb.testcollection", mtest.FirstBatch), mtest.CreateCursorResponse(0, "testdb.testcollection", mtest.NextBatch), ) coll := mt.Client.Database("testdb").Collection("testcollection") changeStream, err := coll.Watch(context.Background(), mongo.Pipeline{}) require.NoError(mt, err) watcher := &ChangeStreamWatcher{ client: nil, // Explicitly nil - edge case changeStream: changeStream, eventChannel: make(chan Event, 10), resumeTokenCol: mt.Client.Database("tokendb").Collection("tokencollection"), clientName: "testclient", done: make(chan struct{}), } - // Close should not panic with nil client - _ = watcher.Close(context.Background()) + // Close should not panic with nil client + require.NotPanics(mt, func() { + _ = watcher.Close(context.Background()) + }) })store-client/pkg/factory/client_factory.go (1)
90-107: Consider making ConnMaxLifetime configurable.The PostgreSQL connection pool configuration correctly applies settings to prevent idle connection accumulation. Default values (MaxPoolSize: 3, MinPoolSize: 1, MaxConnIdleTime: 5 minutes) align with MongoDB settings and PR objectives.
However,
ConnMaxLifetimeis hardcoded to 1 hour. While this is a reasonable default, consider making it configurable through the database config interface for consistency with other pool settings.♻️ Proposed enhancement
Add a new method to the DatabaseConfig interface and implement it similarly to other pool settings:
In
store-client/pkg/config/env_loader.go:GetMaxConnLifetimeSeconds() intThen use it here:
db.SetMaxOpenConns(int(maxPoolSize)) db.SetMaxIdleConns(int(minPoolSize)) db.SetConnMaxIdleTime(maxConnIdleTime) - db.SetConnMaxLifetime(time.Hour) // Max lifetime of a connection + maxConnLifetime := time.Duration(f.dbConfig.GetMaxConnLifetimeSeconds()) * time.Second + if maxConnLifetime == 0 { + maxConnLifetime = time.Hour // Default + } + db.SetConnMaxLifetime(maxConnLifetime)distros/kubernetes/nvsentinel/values.yaml (1)
42-46: Enhance inline documentation for MongoDB pool settings.The pool configuration values lack inline comments explaining their purpose and rationale. As per coding guidelines, all Helm values should be documented with inline comments, and non-obvious configurations should include examples.
Consider documenting:
- The purpose of each setting (e.g., "Maximum number of connections per client")
- The rationale for the defaults (PR objectives mention: "support 1 change stream + 1 query + 1 write concurrently")
- When users might want to adjust these values
📝 Suggested documentation enhancement
- # MongoDB connection pool settings + # MongoDB connection pool settings + # These defaults support 1 change stream + 1 query + 1 write operation concurrently per service mongodb: + # Maximum number of MongoDB connections per client/service maxPoolSize: 3 + # Minimum number of idle connections to keep warm minPoolSize: 1 + # Close idle connections after this many seconds (5 minutes) maxConnIdleTimeSeconds: 300Based on coding guidelines.
store-client/pkg/datastore/providers/postgresql/datastore.go (1)
72-100: Consider logging parse errors and validating pool size relationships.The silent fallback on parse errors (lines 80-94) is safe but could hide configuration mistakes. Additionally, there's no validation that
maxIdleConns <= maxOpenConns, which is a PostgreSQL pool requirement.Consider:
- Logging a warning when override parsing fails, so operators can detect typos
- Validating that
maxIdleConns <= maxOpenConnsafter applying overrides♻️ Example enhancement
if config.Options != nil { if v, ok := config.Options["maxOpenConns"]; ok { if parsed, err := strconv.Atoi(v); err == nil && parsed > 0 { maxOpenConns = parsed + } else { + slog.Warn("Invalid maxOpenConns override, using default", "value", v, "default", maxOpenConns) } } if v, ok := config.Options["maxIdleConns"]; ok { if parsed, err := strconv.Atoi(v); err == nil && parsed > 0 { maxIdleConns = parsed + } else { + slog.Warn("Invalid maxIdleConns override, using default", "value", v, "default", maxIdleConns) } } if v, ok := config.Options["maxConnIdleTimeSeconds"]; ok { if parsed, err := strconv.Atoi(v); err == nil && parsed > 0 { maxConnIdleTime = time.Duration(parsed) * time.Second + } else { + slog.Warn("Invalid maxConnIdleTimeSeconds override, using default", "value", v, "default", int(maxConnIdleTime.Seconds())) } } } + + // Validate pool configuration + if maxIdleConns > maxOpenConns { + slog.Warn("maxIdleConns exceeds maxOpenConns, capping to maxOpenConns", + "maxIdleConns", maxIdleConns, "maxOpenConns", maxOpenConns) + maxIdleConns = maxOpenConns + } db.SetMaxOpenConns(maxOpenConns)store-client/pkg/adapter/legacy_adapter_test.go (1)
22-23: Consider using standard testing package for simple equality checks.Based on learnings, the codebase prefers the standard
testingpackage for simple equality checks, reserving third-party libraries like testify for complex scenarios requiring richer diagnostics.♻️ Example using standard testing package
- "github.com/stretchr/testify/assert" - "github.com/stretchr/testify/require"Then replace assertions like:
- assert.Equal(t, "", adapter.GetAppName(), "AppName should be empty for legacy adapter") + if got := adapter.GetAppName(); got != "" { + t.Errorf("AppName should be empty for legacy adapter, got %q", got) + }- assert.Equal(t, uint64(3), adapter.GetMaxPoolSize(), "MaxPoolSize default should be 3") + if got := adapter.GetMaxPoolSize(); got != 3 { + t.Errorf("MaxPoolSize default should be 3, got %d", got) + }Based on learnings.
store-client/pkg/storewatcher/watch_store.go (1)
551-566: Consider extracting default values as constants for consistency.The defaults here use magic numbers while
store-client/pkg/datastore/providers/mongodb/watcher/watch_store.godefines them as exported constants (DefaultMaxPoolSize,DefaultMinPoolSize,DefaultMaxConnIdleTime). Consider importing and reusing those constants to ensure consistency across packages.♻️ Suggested refactor
Either import the constants from the other package or define local constants:
+const ( + defaultMaxPoolSize = 3 + defaultMinPoolSize = 1 + defaultMaxConnIdleTime = 5 * time.Minute +) + // Apply connection pool settings to prevent idle connection accumulation // These are critical for large clusters with many clients (e.g., 1000+ platform-connectors) maxPoolSize := mongoConfig.MaxPoolSize if maxPoolSize == 0 { - maxPoolSize = 3 // Default max pool size + maxPoolSize = defaultMaxPoolSize } minPoolSize := mongoConfig.MinPoolSize if minPoolSize == 0 { - minPoolSize = 1 // Default min pool size + minPoolSize = defaultMinPoolSize } maxConnIdleTime := time.Duration(mongoConfig.MaxConnIdleTimeSeconds) * time.Second if maxConnIdleTime == 0 { - maxConnIdleTime = 5 * time.Minute // Default: 5 minutes + maxConnIdleTime = defaultMaxConnIdleTime }
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (20)
distros/kubernetes/nvsentinel/charts/csp-health-monitor/templates/deployment.yamldistros/kubernetes/nvsentinel/charts/event-exporter/templates/deployment.yamldistros/kubernetes/nvsentinel/charts/fault-quarantine/templates/deployment.yamldistros/kubernetes/nvsentinel/charts/fault-remediation/templates/deployment.yamldistros/kubernetes/nvsentinel/charts/health-events-analyzer/templates/deployment.yamldistros/kubernetes/nvsentinel/charts/node-drainer/templates/deployment.yamldistros/kubernetes/nvsentinel/templates/configmap-datastore.yamldistros/kubernetes/nvsentinel/templates/daemonset.yamldistros/kubernetes/nvsentinel/values.yamlnode-drainer/pkg/reconciler/reconciler_integration_test.gostore-client/pkg/adapter/legacy_adapter.gostore-client/pkg/adapter/legacy_adapter_test.gostore-client/pkg/client/mongodb_client.gostore-client/pkg/config/env_loader.gostore-client/pkg/datastore/providers/mongodb/watcher/watch_store.gostore-client/pkg/datastore/providers/mongodb/watcher/watch_store_test.gostore-client/pkg/datastore/providers/postgresql/datastore.gostore-client/pkg/factory/client_factory.gostore-client/pkg/storewatcher/config.gostore-client/pkg/storewatcher/watch_store.go
🧰 Additional context used
📓 Path-based instructions (5)
**/daemonset*.yaml
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Explain DaemonSet variant selection logic in Helm chart documentation
Files:
distros/kubernetes/nvsentinel/templates/daemonset.yaml
distros/kubernetes/**/*daemonset*.yaml
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
distros/kubernetes/**/*daemonset*.yaml: Separate DaemonSets should be created for kata vs regular nodes usingnodeAffinitybased on kata.enabled label
Regular node DaemonSets should use/var/logvolume mount for file-based logs
Kata node DaemonSets should use/run/log/journaland/var/log/journalvolume mounts for systemd journal
Files:
distros/kubernetes/nvsentinel/templates/daemonset.yaml
**/*.go
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.go: Follow standard Go conventions with gofmt and golint
Use structured logging vialog/slogin Go code
Wrap errors with context usingfmt.Errorf("context: %w", err)in Go code
Withinretry.RetryOnConflictblocks, return errors without wrapping to preserve retry behavior
Use meaningful variable names such assyncedoverokfor cache sync checks
Useclient-gofor Kubernetes API interactions in Go code
Prefer informers over direct API calls for watching Kubernetes resources
Implement proper shutdown handling with context cancellation in Go code
Package-level godoc required for all Go packages
Function comments required for all exported Go functions
Use inline comments for complex logic only in Go code
TODO comments should reference issues in Go code
Extract informer event handler setup into helper methods
Use separate informers for different Kubernetes resource types
Files:
store-client/pkg/datastore/providers/mongodb/watcher/watch_store_test.gonode-drainer/pkg/reconciler/reconciler_integration_test.gostore-client/pkg/storewatcher/watch_store.gostore-client/pkg/adapter/legacy_adapter_test.gostore-client/pkg/adapter/legacy_adapter.gostore-client/pkg/client/mongodb_client.gostore-client/pkg/factory/client_factory.gostore-client/pkg/datastore/providers/mongodb/watcher/watch_store.gostore-client/pkg/datastore/providers/postgresql/datastore.gostore-client/pkg/storewatcher/config.gostore-client/pkg/config/env_loader.go
**/*_test.go
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
**/*_test.go: Useenvtestfor testing Kubernetes controllers instead of fake clients
Usetestify/assertandtestify/requirefor assertions in Go tests
Write table-driven tests when testing multiple scenarios in Go
Name Go tests descriptively using format:TestFunctionName_Scenario_ExpectedBehavior
Files:
store-client/pkg/datastore/providers/mongodb/watcher/watch_store_test.gonode-drainer/pkg/reconciler/reconciler_integration_test.gostore-client/pkg/adapter/legacy_adapter_test.go
**/values.yaml
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
**/values.yaml: Document all values in Helm chartvalues.yamlwith inline comments
Include examples for non-obvious configurations in Helm chart documentation
Note truthy value requirements in Helm chart documentation where applicable
Files:
distros/kubernetes/nvsentinel/values.yaml
🧠 Learnings (4)
📚 Learning: 2025-12-22T16:16:24.320Z
Learnt from: pteranodan
Repo: NVIDIA/NVSentinel PR: 607
File: client-go/nvgrpc/config_test.go:61-80
Timestamp: 2025-12-22T16:16:24.320Z
Learning: In Go tests across the repository, avoid introducing the testify dependency for simple equality/inequality checks. Use the standard testing package assertions (t.Error, t.Errorf, t.Fatal, etc.) for straightforward checks. Reserve third-party assertion libraries for complex scenarios that require richer diagnostics or expressive matchers.
Applied to files:
store-client/pkg/datastore/providers/mongodb/watcher/watch_store_test.gonode-drainer/pkg/reconciler/reconciler_integration_test.gostore-client/pkg/adapter/legacy_adapter_test.go
📚 Learning: 2025-12-23T05:02:22.108Z
Learnt from: tanishagoyal2
Repo: NVIDIA/NVSentinel PR: 609
File: store-client/pkg/client/postgresql_pipeline_builder.go:119-132
Timestamp: 2025-12-23T05:02:22.108Z
Learning: In the NVSentinel codebase, protobuf fields stored in MongoDB should use lowercase field names (e.g., processingstrategy, componentclass, checkname). Ensure pipeline filters and queries that access protobuf fields in the database consistently use lowercase field names in the store-client package, avoiding camelCase mappings for MongoDB reads/writes.
Applied to files:
store-client/pkg/datastore/providers/mongodb/watcher/watch_store_test.gostore-client/pkg/storewatcher/watch_store.gostore-client/pkg/adapter/legacy_adapter_test.gostore-client/pkg/adapter/legacy_adapter.gostore-client/pkg/client/mongodb_client.gostore-client/pkg/factory/client_factory.gostore-client/pkg/datastore/providers/mongodb/watcher/watch_store.gostore-client/pkg/datastore/providers/postgresql/datastore.gostore-client/pkg/storewatcher/config.gostore-client/pkg/config/env_loader.go
📚 Learning: 2025-11-15T17:25:33.481Z
Learnt from: dims
Repo: NVIDIA/NVSentinel PR: 255
File: node-drainer/pkg/config/config.go:131-159
Timestamp: 2025-11-15T17:25:33.481Z
Learning: In node-drainer's token configuration, the ClientName must be consistent between TokenConfigFromEnv() and NewTokenConfig() since it's used as a MongoDB query key for resume token lookups in the store-client watcher implementation.
Applied to files:
store-client/pkg/storewatcher/watch_store.gostore-client/pkg/datastore/providers/mongodb/watcher/watch_store.gostore-client/pkg/storewatcher/config.go
📚 Learning: 2025-11-24T22:20:48.152Z
Learnt from: CR
Repo: NVIDIA/NVSentinel PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-24T22:20:48.152Z
Learning: Applies to **/*.go : Use structured logging via `log/slog` in Go code
Applied to files:
store-client/pkg/client/mongodb_client.gostore-client/pkg/datastore/providers/postgresql/datastore.go
🧬 Code graph analysis (5)
store-client/pkg/datastore/providers/mongodb/watcher/watch_store_test.go (2)
store-client/pkg/datastore/providers/mongodb/watcher/watch_store.go (2)
ChangeStreamWatcher(89-108)Event(40-40)store-client/pkg/storewatcher/watch_store.go (1)
ChangeStreamWatcher(43-57)
store-client/pkg/adapter/legacy_adapter_test.go (3)
store-client/pkg/datastore/types.go (3)
DataStoreConfig(35-39)ProviderMongoDB(25-25)ConnectionConfig(42-54)store-client/pkg/adapter/legacy_adapter.go (3)
NewLegacyDatabaseConfigAdapter(48-52)ConvertDataStoreConfigToLegacy(27-29)ConvertDataStoreConfigToLegacyWithCertPath(33-38)store-client/pkg/config/env_loader.go (1)
DatabaseConfig(28-38)
store-client/pkg/client/mongodb_client.go (1)
node-drainer/pkg/config/config.go (1)
Duration(37-39)
store-client/pkg/factory/client_factory.go (1)
node-drainer/pkg/config/config.go (1)
Duration(37-39)
store-client/pkg/config/env_loader.go (1)
store-client/pkg/datastore/providers/mongodb/watcher/watch_store.go (2)
DefaultMaxPoolSize(50-50)DefaultMinPoolSize(51-51)
🔇 Additional comments (26)
node-drainer/pkg/reconciler/reconciler_integration_test.go (1)
138-152: LGTM - Mock methods correctly implement the new DatabaseConfig interface.The mock implementations return sensible defaults that align with the PR's connection pool configuration (MaxPoolSize=3, MinPoolSize=1, MaxConnIdleTime=300s). These values match the defaults defined in
env_loader.go.store-client/pkg/config/env_loader.go (2)
356-372: Verify intent: PostgreSQL using MONGODB_ prefixed environment variables.The PostgreSQL configuration path uses
MONGODB_*prefixed environment variables for pool settings. While the comment indicates this is intentional for consistency, operators deploying with PostgreSQL may expectDATASTORE_*orPOSTGRESQL_*prefixed variables.Is this intentional, or should PostgreSQL have its own environment variable names (e.g.,
DATASTORE_MAX_POOL_SIZE) for clarity?
550-569: LGTM - Defensive fallback for uninitialized configuration.The zero-value checks provide good fallback behavior for cases where the struct might not be fully initialized (e.g., created via struct literal without these fields). This is safe because
getPositiveIntEnvVaralready rejects zero/negative values during normal initialization.store-client/pkg/client/mongodb_client.go (2)
347-351: LGTM - Connection pool settings correctly wired to MongoDB client.The pool configuration is properly extracted from the database config and passed to the MongoDB watcher. The time conversion for
MaxConnIdleTimeis correct.
750-769: Verify: NewChangeStreamWatcher missing pool configuration settings.The
mongoConfigcreated forNewChangeStreamWatcherdoesn't include the new pool settings (AppName,MaxPoolSize,MinPoolSize,MaxConnIdleTime). This means change stream watchers may not benefit from the connection pool optimizations introduced in this PR.Should these settings be included for consistency?
mongoConfig := mongoWatcher.MongoDBConfig{ URI: c.config.GetConnectionURI(), Database: c.database, Collection: c.collection, // ...existing TLS and timeout settings... CertWatcher: c.certWatcher, + // Connection pool settings + AppName: c.config.GetAppName(), + MaxPoolSize: c.config.GetMaxPoolSize(), + MinPoolSize: c.config.GetMinPoolSize(), + MaxConnIdleTime: time.Duration(c.config.GetMaxConnIdleTimeSeconds()) * time.Second, }distros/kubernetes/nvsentinel/charts/fault-quarantine/templates/deployment.yaml (1)
138-140: LGTM - App name added for MongoDB connection identification.The
MONGODB_APP_NAMEenvironment variable is correctly set to the chart name, enabling connection identification in MongoDB logs anddb.currentOp().Note: This env var is set unconditionally, even when
datastore.providerispostgresql. This is harmless (the variable will simply be ignored) but you may want to wrap it in a conditional for cleanliness, similar to other MongoDB-specific settings.distros/kubernetes/nvsentinel/charts/fault-remediation/templates/deployment.yaml (1)
139-141: LGTM - Consistent with other deployment templates.The
MONGODB_APP_NAMEenvironment variable follows the same pattern as other charts in this PR.distros/kubernetes/nvsentinel/charts/node-drainer/templates/deployment.yaml (1)
139-141: LGTM - Consistent with test mock and other templates.The
MONGODB_APP_NAMEvalue aligns with the mock implementation inreconciler_integration_test.gowhich returns"node-drainer".distros/kubernetes/nvsentinel/charts/event-exporter/templates/deployment.yaml (1)
123-125: LGTM - Consistent app name configuration.The
MONGODB_APP_NAMEenvironment variable follows the same pattern as other charts, enabling connection identification for the event-exporter component.distros/kubernetes/nvsentinel/templates/daemonset.yaml (1)
148-150: LGTM: MONGODB_APP_NAME addition is correct.The environment variable addition enables connection identification in MongoDB logs and monitoring tools, which aligns with the PR objectives.
distros/kubernetes/nvsentinel/charts/csp-health-monitor/templates/deployment.yaml (2)
141-143: LGTM: Dynamic app name assignment is correct.Using
.Chart.Nameprovides a unique identifier per chart deployment, which aligns with the PR's goal of enabling connection identification in MongoDB logs.
203-205: LGTM: Specific app name for maintenance-notifier is appropriate.The hardcoded value "maintenance-notifier" clearly identifies this sidecar container's connections, enabling differentiation from the main container in the same pod.
distros/kubernetes/nvsentinel/templates/configmap-datastore.yaml (1)
81-86: LGTM: MongoDB pool settings are well-configured.The default values align with the PR rationale (support 1 change stream + 1 query + 1 write concurrently), and the 5-minute idle timeout helps prevent idle connection accumulation in large clusters. The conditional logic correctly handles both explicit MongoDB selection and backward compatibility when no provider is specified.
store-client/pkg/factory/client_factory.go (1)
22-22: LGTM: Time import is necessary.The import is required for
time.Durationandtime.Secondused in the connection pool configuration.distros/kubernetes/nvsentinel/charts/health-events-analyzer/templates/deployment.yaml (1)
121-123: LGTM!The addition of
MONGODB_APP_NAMEis well-documented and correctly uses the chart name for connection identification. This will help with MongoDB diagnostics viadb.currentOp()and logs.store-client/pkg/storewatcher/config.go (1)
47-50: LGTM!The new pool configuration fields are well-structured with sensible defaults that align with the Helm values. The environment variable naming follows a consistent convention, and the types are appropriate (uint64 for pool sizes prevents negative values).
store-client/pkg/adapter/legacy_adapter_test.go (1)
28-69: Good test coverage for pool configuration defaults and conversion.The tests effectively validate:
- Interface compliance of the legacy adapter
- Correct default pool values (AppName "", MaxPoolSize 3, MinPoolSize 1, MaxConnIdleTimeSeconds 300)
- Conversion functions work both with and without certificate paths
Test names follow good conventions and the coverage ensures the adapter correctly exposes the new pool configuration surface.
store-client/pkg/adapter/legacy_adapter.go (1)
161-175: LGTM! Sensible defaults for legacy adapter.The hardcoded defaults align with the pool configuration constants defined elsewhere in the codebase (MaxPoolSize=3, MinPoolSize=1, MaxConnIdleTime=5 minutes). Returning an empty
AppNameis appropriate for the legacy adapter since services using the new configuration path will set their own application names.store-client/pkg/storewatcher/watch_store.go (3)
76-81: Good addition of cleanup helper.The
disconnectOnErrorhelper effectively prevents connection leaks during initialization failures. The pattern of capturingclientandctxfrom the outer scope is clean and ensures all error paths properly release resources.
382-395: LGTM! Proper disconnect handling in Close().The disconnect logic correctly:
- Checks for nil client before attempting disconnect
- Logs warnings on disconnect failure without overriding the original error from
changeStream.Close()- Propagates the disconnect error only when no prior error exists
This ensures connections are properly released during shutdown.
576-580: LGTM! AppName configuration for connection identification.Setting
AppNamewhen provided enables easier identification of connections in MongoDB'sdb.currentOp()and logs, which directly addresses the PR objective of improving visibility in large clusters.store-client/pkg/datastore/providers/mongodb/watcher/watch_store.go (5)
48-53: LGTM! Well-defined exported constants for connection pool defaults.These constants provide clear documentation of the default values and enable reuse across the codebase. The rationale in the PR (support 1 change stream + 1 query + 1 write concurrently) justifies the default of 3 for
MaxPoolSize.
70-74: LGTM! Config struct extended with pool settings.Using
time.Durationdirectly forMaxConnIdleTimeis a cleaner API than integer seconds, as it's type-safe and self-documenting.
127-132: Good cleanup helper pattern.The
disconnectOnErrorhelper is consistently implemented across both watcher files, ensuring proper resource cleanup on initialization failures.
503-516: LGTM! Proper MongoDB client disconnect on Close().The implementation correctly:
- Guards against nil client
- Logs the disconnect outcome for observability
- Preserves the original
changeStream.Close()error when both operations fail- Only promotes disconnect error to return value when no prior error exists
This ensures connections are properly released when the watcher is closed.
654-685: LGTM! Pool settings applied with proper defaults.The implementation correctly uses the defined constants as fallbacks and applies all pool settings to the MongoDB client options. Logging the
AppNameat Info level aids debugging in production environments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (6)
store-client/pkg/datastore/providers/postgresql/datastore.go (1)
61-74: Silent fallback for invalid values may mask configuration errors.When
parsed <= 0, the function silently returns the default without any logging. This could mask configuration errors where an operator accidentally setsmaxOpenConns=0or a negative value, expecting it to mean "unlimited" or similar.Consider logging a warning when an explicitly provided value is rejected:
♻️ Suggested improvement
func getPoolOption(options map[string]string, key string, defaultVal int) int { if options == nil { return defaultVal } if v, ok := options[key]; ok { if parsed, err := strconv.Atoi(v); err == nil && parsed > 0 { return parsed + } else if err == nil { + slog.Warn("Pool option value must be positive, using default", + "key", key, "provided", parsed, "default", defaultVal) } } return defaultVal }store-client/pkg/datastore/providers/mongodb/watcher/watch_store.go (2)
48-53: Duplicate constant definitions with env_loader.go.These pool defaults are also defined in
store-client/pkg/config/env_loader.go(lines 93-96). Consider importing from a shared location to avoid drift.#!/bin/bash # Verify duplicate constant definitions rg -n "DefaultMaxPoolSize|DefaultMinPoolSize|DefaultMaxConnIdleTime" --type=go store-client/
569-624: GetCollectionClient lacks mechanism for caller to disconnect the client.The
disconnectOnErrorcleanup on error paths is good. However, on the happy path (line 623), the caller receives a*mongo.Collectionbut has no reference to the underlying*mongo.Client. This means there's no way to callDisconnect()when the collection is no longer needed, which could lead to connection leaks in long-running services.Consider returning both the collection and client, or providing a cleanup function:
♻️ Suggested approach
// Option 1: Return client along with collection func GetCollectionClient(...) (*mongo.Collection, *mongo.Client, error) // Option 2: Return a cleanup function func GetCollectionClient(...) (*mongo.Collection, func(), error) { // ... cleanup := func() { if disconnectErr := client.Disconnect(context.Background()); disconnectErr != nil { slog.Warn("Failed to disconnect client", "error", disconnectErr) } } return collection, cleanup, nil }store-client/pkg/config/env_loader.go (3)
76-80: Environment variable naming uses MONGODB_ prefix for both providers.These env vars have a
MONGODB_prefix but are used for PostgreSQL configuration as well (see line 375). Consider using provider-agnostic names likeDATASTORE_APP_NAME,DATASTORE_MAX_POOL_SIZE, etc., for consistency with other PostgreSQL env vars (DATASTORE_HOST,DATASTORE_PORT).Is the use of
MONGODB_*env vars for PostgreSQL configuration intentional for backward compatibility, or should provider-agnostic names be introduced?
93-96: Pool defaults are correctly defined but duplicated in watch_store.go.These values align with the PR objectives (MaxPoolSize=3 for concurrent change stream + query + write). However, they duplicate constants in
store-client/pkg/datastore/providers/mongodb/watcher/watch_store.go(lines 49-52).Consider extracting these to a shared constants file to prevent drift.
374-401: PostgreSQL config uses MONGODB_ environment variables for pool settings.*The comment at line 374 acknowledges this applies to PostgreSQL too, but using
EnvMongoDBAppName,EnvMongoDBMaxPoolSize, etc., for PostgreSQL configuration could confuse operators.This is the same issue noted earlier regarding env var naming. If the
MONGODB_*naming is intentional for simplicity/backward compatibility, consider adding a comment explaining why.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
store-client/pkg/config/env_loader.gostore-client/pkg/datastore/providers/mongodb/watcher/watch_store.gostore-client/pkg/datastore/providers/postgresql/datastore.gostore-client/pkg/factory/client_factory.gostore-client/pkg/storewatcher/watch_store.go
🚧 Files skipped from review as they are similar to previous changes (2)
- store-client/pkg/factory/client_factory.go
- store-client/pkg/storewatcher/watch_store.go
🧰 Additional context used
📓 Path-based instructions (1)
**/*.go
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.go: Follow standard Go conventions with gofmt and golint
Use structured logging vialog/slogin Go code
Wrap errors with context usingfmt.Errorf("context: %w", err)in Go code
Withinretry.RetryOnConflictblocks, return errors without wrapping to preserve retry behavior
Use meaningful variable names such assyncedoverokfor cache sync checks
Useclient-gofor Kubernetes API interactions in Go code
Prefer informers over direct API calls for watching Kubernetes resources
Implement proper shutdown handling with context cancellation in Go code
Package-level godoc required for all Go packages
Function comments required for all exported Go functions
Use inline comments for complex logic only in Go code
TODO comments should reference issues in Go code
Extract informer event handler setup into helper methods
Use separate informers for different Kubernetes resource types
Files:
store-client/pkg/datastore/providers/postgresql/datastore.gostore-client/pkg/datastore/providers/mongodb/watcher/watch_store.gostore-client/pkg/config/env_loader.go
🧠 Learnings (3)
📚 Learning: 2025-11-24T22:20:48.152Z
Learnt from: CR
Repo: NVIDIA/NVSentinel PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-24T22:20:48.152Z
Learning: Applies to **/*.go : Use structured logging via `log/slog` in Go code
Applied to files:
store-client/pkg/datastore/providers/postgresql/datastore.go
📚 Learning: 2025-12-23T05:02:22.108Z
Learnt from: tanishagoyal2
Repo: NVIDIA/NVSentinel PR: 609
File: store-client/pkg/client/postgresql_pipeline_builder.go:119-132
Timestamp: 2025-12-23T05:02:22.108Z
Learning: In the NVSentinel codebase, protobuf fields stored in MongoDB should use lowercase field names (e.g., processingstrategy, componentclass, checkname). Ensure pipeline filters and queries that access protobuf fields in the database consistently use lowercase field names in the store-client package, avoiding camelCase mappings for MongoDB reads/writes.
Applied to files:
store-client/pkg/datastore/providers/postgresql/datastore.gostore-client/pkg/datastore/providers/mongodb/watcher/watch_store.gostore-client/pkg/config/env_loader.go
📚 Learning: 2025-11-15T17:25:33.481Z
Learnt from: dims
Repo: NVIDIA/NVSentinel PR: 255
File: node-drainer/pkg/config/config.go:131-159
Timestamp: 2025-11-15T17:25:33.481Z
Learning: In node-drainer's token configuration, the ClientName must be consistent between TokenConfigFromEnv() and NewTokenConfig() since it's used as a MongoDB query key for resume token lookups in the store-client watcher implementation.
Applied to files:
store-client/pkg/datastore/providers/mongodb/watcher/watch_store.go
🧬 Code graph analysis (3)
store-client/pkg/datastore/providers/postgresql/datastore.go (1)
store-client/pkg/datastore/types.go (2)
ConnectionConfig(42-54)DataStoreConfig(35-39)
store-client/pkg/datastore/providers/mongodb/watcher/watch_store.go (1)
store-client/pkg/config/env_loader.go (2)
DefaultMaxPoolSize(94-94)DefaultMinPoolSize(95-95)
store-client/pkg/config/env_loader.go (1)
store-client/pkg/datastore/providers/mongodb/watcher/watch_store.go (2)
DefaultMaxPoolSize(50-50)DefaultMinPoolSize(51-51)
🔇 Additional comments (12)
store-client/pkg/datastore/providers/postgresql/datastore.go (2)
40-59: Validation logic looks good.The function correctly validates required connection parameters and port range. Error messages are clear and actionable.
76-104: Pool configuration and validation integration looks correct.The changes properly:
- Validate config before attempting connection
- Apply pool settings with sensible defaults matching the PR objectives (MaxPoolSize=3, MinPoolSize=1, MaxConnIdleTime=5 min)
- Close the DB on ping failure (line 91)
- Use
SetConnMaxLifetime(time.Hour)to prevent stale connectionsThe default values align with the MongoDB pool defaults mentioned in the PR objectives.
store-client/pkg/datastore/providers/mongodb/watcher/watch_store.go (5)
70-74: New pool configuration fields in MongoDBConfig look good.The new fields align with the MongoDB driver's client options and support the PR objectives for configurable connection pool limits.
127-132: Good pattern for preventing connection leaks during initialization.The
disconnectOnErrorhelper ensures the MongoDB client is properly disconnected when initialization fails after connection. This addresses the PR objective of addingdisconnectOnError()to prevent connection leaks on error paths.
134-147: Consistent application of disconnectOnError across all error paths.All validation and connectivity check failures now properly disconnect the client before returning. This ensures no orphaned connections remain when
NewChangeStreamWatcherfails.Also applies to: 153-168, 198-211
504-517: Client disconnect on Close() addresses PR objectives.This change ensures
client.Disconnect()is called during shutdown, preventing connection accumulation. The implementation correctly:
- Checks for nil client
- Logs success/failure
- Preserves the original error from
changeStream.Close()if both fail
655-688: Pool configuration in constructMongoClientOptions is well-implemented.The implementation correctly:
- Applies defaults when config values are zero
- Sets
MaxPoolSize,MinPoolSize, andMaxConnIdleTimeto prevent idle connection accumulation- Conditionally sets
AppNamefor easier identification indb.currentOp()and MongoDB logs, as mentioned in PR objectivesThe comments explaining the rationale for these settings are helpful.
store-client/pkg/config/env_loader.go (5)
34-37: Interface extension is backward-compatible and well-designed.Adding new methods to the
DatabaseConfiginterface enables consuming components to access pool settings and app name in a database-agnostic way.
196-222: Helper functions improve code clarity.
getRequiredEnvandgetCollectionNamereduce duplication and provide consistent error messages.
264-291: Pool settings loading is correctly implemented.The logic properly loads pool settings from environment variables with defaults, and the returned config struct is fully populated.
479-497:getPositiveUint64EnvVarimplementation is correct.The function properly:
- Returns the default when the env var is unset
- Parses as uint64 for MongoDB driver compatibility
- Rejects zero values with a clear error message
588-610: Getter implementations provide defensive defaults.The getters return sensible defaults if the struct fields are zero. This is good defensive programming for cases where
StandardDatabaseConfigis constructed without going through the loading functions (e.g., in tests).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
store-client/pkg/storewatcher/watch_store.go (2)
373-403: Close() error reporting is misleading when disconnect fails.If
changeStream.Close()succeeds butclient.Disconnect()fails, you seterrto a disconnect error but then wrap it withfailed to close change stream..., which is confusing (and can misdirect operators). Consider trackingcloseErranddisconnectErrseparately and returning the right one (orerrors.Join).
312-340: MakeMarkProcessedretries ctx-interruptible (avoidtime.Sleep).
time.Sleep(w.resumeTokenUpdateInterval)won’t exit promptly on shutdown/cancel. Use aselectwithtime.After(...)(same pattern you used inopenChangeStreamWithRetry).store-client/pkg/client/mongodb_client.go (1)
17-24: Add validation for pool configuration before applying to MongoDB client.Currently,
MaxConnIdleTimeand pool sizes are applied without runtime validation. Add guards to prevent invalid configurations:
MaxConnIdleTimeSeconds ≤ 0should be skipped (use MongoDB default)MinPoolSize > MaxPoolSizeshould return a validation errorThis prevents misconfiguration from reaching the MongoDB driver and causing connection pool failures.
Proposed diff
@@ ChangeStreamRetryIntervalSeconds: dbConfig.GetTimeoutConfig().GetChangeStreamRetryIntervalSeconds(), // Connection pool settings to prevent idle connection accumulation - AppName: dbConfig.GetAppName(), - MaxPoolSize: dbConfig.GetMaxPoolSize(), - MinPoolSize: dbConfig.GetMinPoolSize(), - MaxConnIdleTime: time.Duration(dbConfig.GetMaxConnIdleTimeSeconds()) * time.Second, + AppName: dbConfig.GetAppName(), + MaxPoolSize: dbConfig.GetMaxPoolSize(), + MinPoolSize: dbConfig.GetMinPoolSize(), } + + idleSeconds := dbConfig.GetMaxConnIdleTimeSeconds() + if idleSeconds > 0 { + mongoConfig.MaxConnIdleTime = time.Duration(idleSeconds) * time.Second + } + if mongoConfig.MaxPoolSize > 0 && mongoConfig.MinPoolSize > mongoConfig.MaxPoolSize { + return nil, datastore.NewValidationError( + datastore.ProviderMongoDB, + "invalid connection pool configuration: minPoolSize cannot exceed maxPoolSize", + nil, + ).WithMetadata("minPoolSize", mongoConfig.MinPoolSize). + WithMetadata("maxPoolSize", mongoConfig.MaxPoolSize) + }
🤖 Fix all issues with AI agents
In @store-client/pkg/datastore/providers/postgresql/datastore.go:
- Around line 40-59: validatePostgreSQLConfig currently rejects conn.Port == 0
and requires conn.Username, which contradicts buildConnectionString behavior and
lib/pq defaults; update validatePostgreSQLConfig (operating on
datastore.ConnectionConfig) to only require Host and Database, make Username
optional (remove the username empty check), and only validate port range when
conn.Port > 0 (i.e., ensure port <= 65535 when set); adjust returned error
messages accordingly so optional port/username behavior is preserved.
In @store-client/pkg/storewatcher/watch_store.go:
- Around line 76-96: disconnectOnError currently calls client.Disconnect(ctx)
using the caller's ctx which may be canceled/expired; change it to create a
fresh background context with a short timeout (e.g.,
context.WithTimeout(context.Background(), 5s)) and call client.Disconnect on
that cleanup context, ensuring you cancel the timeout when done and log any
disconnect error; apply the same pattern in GetCollectionClient (the
disconnect/cleanup paths) and in the Close method if it uses a caller-supplied
ctx for shutdown.
🧹 Nitpick comments (4)
store-client/pkg/storewatcher/watch_store_test.go (1)
192-197: Tests mutatingwatcher.client=nilis brittle; prefer an explicit “skip disconnect” hook for testability.Setting internals to
nilworks, but it’s easy to miss future tests/call-sites and masks real Close() behavior. Consider adding aChangeStreamWatcherfield likedisconnect func(context.Context) error(defaulting toclient.Disconnect) or an option flag used only by tests/mtest.Also applies to: 295-299, 360-364, 432-436
store-client/pkg/adapter/legacy_adapter.go (1)
161-175: Avoid hardcoded pool defaults here; centralize defaults (or read fromdsConfig.Options) to prevent drift.If these defaults are meant to match env/chart defaults, define shared constants (e.g., in
store-client/pkg/config) or let the legacy adapter consultl.dsConfig.Options(with the same parsing rules as PostgreSQL).store-client/pkg/storewatcher/watch_store.go (1)
552-573: Validate pool settings (min/max/idle) before applying; and be careful about logging AppName at info on every init.Add checks like
MinPoolSize <= MaxPoolSize(when MaxPoolSize > 0) andMaxConnIdleTimeSeconds >= 0to prevent invalid configs. Also consider whetherslog.Infofor app name should beDebugto reduce log noise across many pods. Based on learnings, prefer structured logging vialog/slog.store-client/pkg/datastore/providers/postgresql/datastore.go (1)
61-74: Consider logging when pool options are present but invalid (today they silently fall back).If
Options[maxOpenConns]is set to a non-int/<=0, silently defaulting can hide misconfig. A one-timeslog.Warnduring init (include key/value) can save debugging time. Based on learnings, prefer structured logging vialog/slog.Also applies to: 96-105
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (21)
distros/kubernetes/nvsentinel/charts/csp-health-monitor/templates/deployment.yamldistros/kubernetes/nvsentinel/charts/event-exporter/templates/deployment.yamldistros/kubernetes/nvsentinel/charts/fault-quarantine/templates/deployment.yamldistros/kubernetes/nvsentinel/charts/fault-remediation/templates/deployment.yamldistros/kubernetes/nvsentinel/charts/health-events-analyzer/templates/deployment.yamldistros/kubernetes/nvsentinel/charts/node-drainer/templates/deployment.yamldistros/kubernetes/nvsentinel/templates/configmap-datastore.yamldistros/kubernetes/nvsentinel/templates/daemonset.yamldistros/kubernetes/nvsentinel/values.yamlnode-drainer/pkg/reconciler/reconciler_integration_test.gostore-client/pkg/adapter/legacy_adapter.gostore-client/pkg/adapter/legacy_adapter_test.gostore-client/pkg/client/mongodb_client.gostore-client/pkg/config/env_loader.gostore-client/pkg/datastore/providers/mongodb/watcher/watch_store.gostore-client/pkg/datastore/providers/mongodb/watcher/watch_store_test.gostore-client/pkg/datastore/providers/postgresql/datastore.gostore-client/pkg/factory/client_factory.gostore-client/pkg/storewatcher/config.gostore-client/pkg/storewatcher/watch_store.gostore-client/pkg/storewatcher/watch_store_test.go
✅ Files skipped from review due to trivial changes (1)
- store-client/pkg/adapter/legacy_adapter_test.go
🚧 Files skipped from review as they are similar to previous changes (10)
- node-drainer/pkg/reconciler/reconciler_integration_test.go
- store-client/pkg/factory/client_factory.go
- distros/kubernetes/nvsentinel/templates/daemonset.yaml
- store-client/pkg/datastore/providers/mongodb/watcher/watch_store_test.go
- distros/kubernetes/nvsentinel/charts/event-exporter/templates/deployment.yaml
- distros/kubernetes/nvsentinel/values.yaml
- distros/kubernetes/nvsentinel/charts/health-events-analyzer/templates/deployment.yaml
- store-client/pkg/storewatcher/config.go
- distros/kubernetes/nvsentinel/templates/configmap-datastore.yaml
- distros/kubernetes/nvsentinel/charts/csp-health-monitor/templates/deployment.yaml
🧰 Additional context used
📓 Path-based instructions (2)
**/*.go
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.go: Follow standard Go conventions with gofmt and golint
Use structured logging vialog/slogin Go code
Wrap errors with context usingfmt.Errorf("context: %w", err)in Go code
Withinretry.RetryOnConflictblocks, return errors without wrapping to preserve retry behavior
Use meaningful variable names such assyncedoverokfor cache sync checks
Useclient-gofor Kubernetes API interactions in Go code
Prefer informers over direct API calls for watching Kubernetes resources
Implement proper shutdown handling with context cancellation in Go code
Package-level godoc required for all Go packages
Function comments required for all exported Go functions
Use inline comments for complex logic only in Go code
TODO comments should reference issues in Go code
Extract informer event handler setup into helper methods
Use separate informers for different Kubernetes resource types
Files:
store-client/pkg/datastore/providers/postgresql/datastore.gostore-client/pkg/client/mongodb_client.gostore-client/pkg/storewatcher/watch_store.gostore-client/pkg/storewatcher/watch_store_test.gostore-client/pkg/datastore/providers/mongodb/watcher/watch_store.gostore-client/pkg/adapter/legacy_adapter.gostore-client/pkg/config/env_loader.go
**/*_test.go
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
**/*_test.go: Useenvtestfor testing Kubernetes controllers instead of fake clients
Usetestify/assertandtestify/requirefor assertions in Go tests
Write table-driven tests when testing multiple scenarios in Go
Name Go tests descriptively using format:TestFunctionName_Scenario_ExpectedBehavior
Files:
store-client/pkg/storewatcher/watch_store_test.go
🧠 Learnings (4)
📚 Learning: 2025-11-24T22:20:48.152Z
Learnt from: CR
Repo: NVIDIA/NVSentinel PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-24T22:20:48.152Z
Learning: Applies to **/*.go : Use structured logging via `log/slog` in Go code
Applied to files:
store-client/pkg/datastore/providers/postgresql/datastore.gostore-client/pkg/client/mongodb_client.go
📚 Learning: 2025-12-23T05:02:22.108Z
Learnt from: tanishagoyal2
Repo: NVIDIA/NVSentinel PR: 609
File: store-client/pkg/client/postgresql_pipeline_builder.go:119-132
Timestamp: 2025-12-23T05:02:22.108Z
Learning: In the NVSentinel codebase, protobuf fields stored in MongoDB should use lowercase field names (e.g., processingstrategy, componentclass, checkname). Ensure pipeline filters and queries that access protobuf fields in the database consistently use lowercase field names in the store-client package, avoiding camelCase mappings for MongoDB reads/writes.
Applied to files:
store-client/pkg/datastore/providers/postgresql/datastore.gostore-client/pkg/client/mongodb_client.gostore-client/pkg/storewatcher/watch_store.gostore-client/pkg/storewatcher/watch_store_test.gostore-client/pkg/datastore/providers/mongodb/watcher/watch_store.gostore-client/pkg/adapter/legacy_adapter.gostore-client/pkg/config/env_loader.go
📚 Learning: 2025-11-15T17:25:33.481Z
Learnt from: dims
Repo: NVIDIA/NVSentinel PR: 255
File: node-drainer/pkg/config/config.go:131-159
Timestamp: 2025-11-15T17:25:33.481Z
Learning: In node-drainer's token configuration, the ClientName must be consistent between TokenConfigFromEnv() and NewTokenConfig() since it's used as a MongoDB query key for resume token lookups in the store-client watcher implementation.
Applied to files:
store-client/pkg/storewatcher/watch_store.gostore-client/pkg/datastore/providers/mongodb/watcher/watch_store.go
📚 Learning: 2025-12-22T16:16:24.320Z
Learnt from: pteranodan
Repo: NVIDIA/NVSentinel PR: 607
File: client-go/nvgrpc/config_test.go:61-80
Timestamp: 2025-12-22T16:16:24.320Z
Learning: In Go tests across the repository, avoid introducing the testify dependency for simple equality/inequality checks. Use the standard testing package assertions (t.Error, t.Errorf, t.Fatal, etc.) for straightforward checks. Reserve third-party assertion libraries for complex scenarios that require richer diagnostics or expressive matchers.
Applied to files:
store-client/pkg/storewatcher/watch_store_test.go
🧬 Code graph analysis (4)
store-client/pkg/datastore/providers/postgresql/datastore.go (2)
store-client/pkg/datastore/types.go (2)
ConnectionConfig(42-54)DataStoreConfig(35-39)node-drainer/pkg/config/config.go (1)
Duration(37-39)
store-client/pkg/client/mongodb_client.go (1)
node-drainer/pkg/config/config.go (1)
Duration(37-39)
store-client/pkg/datastore/providers/mongodb/watcher/watch_store.go (2)
node-drainer/pkg/config/config.go (1)
Duration(37-39)store-client/pkg/config/env_loader.go (2)
DefaultMaxPoolSize(94-94)DefaultMinPoolSize(95-95)
store-client/pkg/config/env_loader.go (1)
store-client/pkg/datastore/providers/mongodb/watcher/watch_store.go (2)
DefaultMaxPoolSize(50-50)DefaultMinPoolSize(51-51)
🔇 Additional comments (18)
distros/kubernetes/nvsentinel/charts/node-drainer/templates/deployment.yaml (1)
139-141: LGTM! Consistent with other deployments.The implementation is identical to the other deployment templates in this PR, ensuring consistency across all NVSentinel services. The
MONGODB_APP_NAMEwill correctly identify this service as "node-drainer" in MongoDB operations and logs.distros/kubernetes/nvsentinel/charts/fault-quarantine/templates/deployment.yaml (1)
138-140: LGTM! Matches PR example.The implementation correctly adds the
MONGODB_APP_NAMEenvironment variable, and "fault-quarantine" is explicitly mentioned in the PR objectives as an example service name for MongoDB connection identification. The implementation is consistent with the other deployment templates.distros/kubernetes/nvsentinel/charts/fault-remediation/templates/deployment.yaml (1)
139-141: LGTM! MongoDB app name addition enhances observability.The addition of the
MONGODB_APP_NAMEenvironment variable aligns with the PR objectives to identify services in MongoDB logs anddb.currentOp(). The use of{{ .Chart.Name | quote }}is idiomatic and will correctly resolve to "fault-remediation".The Go application properly consumes the environment variable in the config struct (
store-client/pkg/storewatcher/config.go) and passes it to the MongoDB client viaSetAppName()in the watch store initialization code (store-client/pkg/storewatcher/watch_store.goandstore-client/pkg/datastore/providers/mongodb/watcher/watch_store.go). The implementation includes appropriate conditional checks and logging.store-client/pkg/datastore/providers/mongodb/watcher/watch_store.go (6)
71-74: LGTM! Pool configuration fields properly added.The new fields for connection pool management are well-structured and align with MongoDB driver requirements. The use of
uint64for pool sizes matches the MongoDB driver's API, andtime.Durationfor idle time is idiomatic Go.
128-132: Excellent pattern for preventing connection leaks on error paths.The
disconnectOnErrorhelper ensures that MongoDB connections are properly cleaned up when initialization fails. The helper is correctly placed aftermongo.Connect()succeeds and is consistently called on all validation failures.Also applies to: 135-135, 140-140, 145-145, 156-156, 166-166, 200-200, 209-209
504-517: Proper cleanup of MongoDB client on shutdown.The addition of
client.Disconnect()in theClose()method ensures that connections are properly released. The error handling correctly preserves the originalchangeStream.Close()error while logging any disconnect failures.
655-686: Well-structured connection pool configuration with sensible defaults.The pool configuration logic correctly applies defaults when values are unset and configures the MongoDB client appropriately. The conditional
AppNamesetting with logging is particularly useful for identifying connections in MongoDB'scurrentOp()and logs, which directly addresses the PR's observability goals.
48-53: The constants are not true duplicates. WhileDefaultMaxPoolSizeandDefaultMinPoolSizehave identical names and values in both files, the time-related constant differs:watch_store.godefinesDefaultMaxConnIdleTime = 5 * time.Minute(atime.Duration), whileenv_loader.godefinesDefaultMaxConnIdleTimeSeconds = 300(anintrepresenting seconds). These serve different purposes—one works with Go's duration type directly, the other handles seconds for environment variable parsing. Consolidating them would introduce unnecessary cross-package dependencies and type conversions. The duplication concern is overstated given the different types and units involved.Likely an incorrect or invalid review comment.
583-588: Error-path cleanup is correct, but success-path lifecycle concern is unfounded.The
disconnectOnErrorhelper properly prevents leaks during initialization failures and is called appropriately on all error paths.However, the concern about success-path lifecycle is incorrect.
GetCollectionClientis called once duringMongoDBClientinitialization, and the underlyingmongo.Clientis extracted (line 410:client := mongoCol.Database().Client()) and stored in theMongoDBClientstruct. The client lifecycle is properly managed via theMongoDBClient.Close()method, which explicitly callsc.client.Disconnect(ctx)at line 826. Callers do have a way to disconnect the client when done—through theClose()method on the returned client object.store-client/pkg/config/env_loader.go (9)
77-80: LGTM! Environment variable constants follow conventions.The new constants use consistent
MONGODB_prefixes and clear, descriptive names that align with existing patterns in the codebase.
93-96: Connection pool defaults align with PR objectives.The defaults (max: 3, min: 1, idle: 5 minutes) match the stated rationale of supporting "1 change stream + 1 query + 1 write concurrently" and the idle timeout helps prevent connection accumulation in large clusters.
129-132: LGTM! Private fields with appropriate types.The new fields follow good encapsulation practices by being private with public getters. The types are appropriate:
uint64for pool sizes (matching MongoDB driver) andintfor timeout seconds.
196-222: Good refactoring with helper functions.The
getRequiredEnvandgetCollectionNamehelpers reduce code duplication and improve readability. The logic correctly handles environment variable precedence and provides clear error messages.
264-280: Proper configuration loading with validation.The pool settings are loaded correctly with appropriate validation. The
AppNameis optional (which makes sense for backward compatibility), while pool sizes are validated to be positive integers.
374-390: Consistent pool configuration for PostgreSQL path.The pool settings are loaded identically to the MongoDB path, which is appropriate since connection pooling concepts are database-agnostic. The comment clarifies that these settings apply to PostgreSQL as well.
Minor note: Consider whether
MONGODB_APP_NAMEand otherMONGODB_*prefixed variables should be renamed toDATASTORE_*for true database-agnostic configuration, but this is a nice-to-have refinement and not critical for this PR.
479-497: Well-implemented uint64 environment variable parser.The
getPositiveUint64EnvVarhelper correctly validates that pool sizes are positive (non-zero), which is essential for connection pool configuration. Error messages provide clear context for debugging.
584-610: Getters provide sensible defaults for zero values.The getter implementations correctly apply defaults when fields are zero, providing defensive behavior for partially-initialized configs. The
GetAppName()appropriately returns the empty string without applying a default, since an empty app name is semantically valid (no app name specified).Optional refinement: The explicit
uint64()casts at lines 590 and 598 are unnecessary sinceDefaultMaxPoolSizeandDefaultMinPoolSizeare untyped constants that can be directly assigned touint64, but the explicit cast improves clarity.
34-37: Breaking change to public interface—ensure release notes are updated and communicate to consumers.Adding four new methods to the
DatabaseConfiginterface is a breaking change for any external implementations. The single implementation in this codebase (StandardDatabaseConfig) has been properly updated with all new getter methods. However, ensure this change is documented in release notes and communicated to external consumers who may have implemented this interface.
Merging this branch will decrease overall coverage
Coverage by fileChanged files (no unit tests)
Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code. Changed unit test files
|
🛡️ CodeQL Analysis🚨 Found 1 security alert(s) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
store-client/pkg/datastore/providers/mongodb/watcher/watch_store.go (1)
504-524: Minor: Error message may be misleading when wrapping disconnect errors.When
changeStream.Close()succeeds butclient.Disconnect()fails, the error from line 512 is returned. However, it gets wrapped at line 520 with the message "failed to close change stream", which is misleading since the actual failure was during client disconnect.Consider using a more accurate message or handling the two error cases separately:
Suggested fix
- if err != nil { - return fmt.Errorf("failed to close change stream for client %s: %w", w.clientName, err) - } + if err != nil { + return err // Error already has appropriate context from lines 512 or 495 + }Alternatively, add context at line 512 to distinguish:
- err = fmt.Errorf("failed to disconnect MongoDB client for %s: %w", w.clientName, disconnectErr) + err = disconnectErr
🧹 Nitpick comments (3)
store-client/pkg/factory/client_factory.go (1)
90-98: Consider making ConnMaxLifetime configurable.The connection pool settings are now configurable via environment variables (MaxOpenConns, MaxIdleConns, ConnMaxIdleTime), but
ConnMaxLifetimeis hardcoded to 1 hour. For consistency and flexibility, consider adding this as a configurable parameter as well.♻️ Example: Add configurable ConnMaxLifetime
In
store-client/pkg/config/env_loader.go, add:EnvMongoDBMaxConnLifetimeSeconds = "MONGODB_MAX_CONN_LIFETIME_SECONDS" DefaultMaxConnLifetimeSeconds = 3600 // 1 hourThen retrieve and apply it here:
maxConnIdleTime := time.Duration(f.dbConfig.GetMaxConnIdleTimeSeconds()) * time.Second +maxConnLifetime := time.Duration(f.dbConfig.GetMaxConnLifetimeSeconds()) * time.Second db.SetMaxOpenConns(poolSizeToInt(maxPoolSize)) db.SetMaxIdleConns(poolSizeToInt(minPoolSize)) db.SetConnMaxIdleTime(maxConnIdleTime) -db.SetConnMaxLifetime(time.Hour) // Max lifetime of a connection +db.SetConnMaxLifetime(maxConnLifetime)store-client/pkg/datastore/providers/mongodb/watcher/watch_store.go (2)
569-624: Pre-existing design concern:GetCollectionClientleaks client reference.The function returns a
*mongo.Collectionbut the underlying*mongo.Clientis not exposed to callers. This means callers have no way to disconnect the client when they're done, which could lead to connection accumulation over time ifGetCollectionClientis called multiple times.This is a pre-existing issue, but given the PR's focus on connection pool management, consider either:
- Returning both the collection and client (or a wrapper struct with a
Close()method)- Documenting that callers should reuse the returned collection rather than calling
GetCollectionClientrepeatedly
655-688: Consider using Debug level for AppName configuration log.The log at line 685 will be emitted on every MongoDB client creation. For production deployments with frequent client connections, Info level may be too verbose. Consider using Debug level instead.
Suggested change
// Set AppName if provided - this helps identify connections in MongoDB logs and currentOp if mongoConfig.AppName != "" { clientOpts.SetAppName(mongoConfig.AppName) - slog.Info("MongoDB client configured with app name", "appName", mongoConfig.AppName) + slog.Debug("MongoDB client configured with app name", "appName", mongoConfig.AppName) }
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
store-client/pkg/adapter/legacy_adapter.gostore-client/pkg/config/env_loader.gostore-client/pkg/datastore/providers/mongodb/watcher/watch_store.gostore-client/pkg/datastore/providers/mongodb/watcher/watch_store_test.gostore-client/pkg/datastore/providers/postgresql/datastore.gostore-client/pkg/factory/client_factory.gostore-client/pkg/storewatcher/watch_store.gostore-client/pkg/storewatcher/watch_store_test.go
🚧 Files skipped from review as they are similar to previous changes (2)
- store-client/pkg/storewatcher/watch_store_test.go
- store-client/pkg/adapter/legacy_adapter.go
🧰 Additional context used
📓 Path-based instructions (2)
**/*.go
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.go: Follow standard Go conventions with gofmt and golint
Use structured logging vialog/slogin Go code
Wrap errors with context usingfmt.Errorf("context: %w", err)in Go code
Withinretry.RetryOnConflictblocks, return errors without wrapping to preserve retry behavior
Use meaningful variable names such assyncedoverokfor cache sync checks
Useclient-gofor Kubernetes API interactions in Go code
Prefer informers over direct API calls for watching Kubernetes resources
Implement proper shutdown handling with context cancellation in Go code
Package-level godoc required for all Go packages
Function comments required for all exported Go functions
Use inline comments for complex logic only in Go code
TODO comments should reference issues in Go code
Extract informer event handler setup into helper methods
Use separate informers for different Kubernetes resource types
Files:
store-client/pkg/factory/client_factory.gostore-client/pkg/storewatcher/watch_store.gostore-client/pkg/datastore/providers/mongodb/watcher/watch_store_test.gostore-client/pkg/datastore/providers/postgresql/datastore.gostore-client/pkg/datastore/providers/mongodb/watcher/watch_store.gostore-client/pkg/config/env_loader.go
**/*_test.go
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
**/*_test.go: Useenvtestfor testing Kubernetes controllers instead of fake clients
Usetestify/assertandtestify/requirefor assertions in Go tests
Write table-driven tests when testing multiple scenarios in Go
Name Go tests descriptively using format:TestFunctionName_Scenario_ExpectedBehavior
Files:
store-client/pkg/datastore/providers/mongodb/watcher/watch_store_test.go
🧠 Learnings (5)
📚 Learning: 2025-12-23T05:02:22.108Z
Learnt from: tanishagoyal2
Repo: NVIDIA/NVSentinel PR: 609
File: store-client/pkg/client/postgresql_pipeline_builder.go:119-132
Timestamp: 2025-12-23T05:02:22.108Z
Learning: In the NVSentinel codebase, protobuf fields stored in MongoDB should use lowercase field names (e.g., processingstrategy, componentclass, checkname). Ensure pipeline filters and queries that access protobuf fields in the database consistently use lowercase field names in the store-client package, avoiding camelCase mappings for MongoDB reads/writes.
Applied to files:
store-client/pkg/factory/client_factory.gostore-client/pkg/storewatcher/watch_store.gostore-client/pkg/datastore/providers/mongodb/watcher/watch_store_test.gostore-client/pkg/datastore/providers/postgresql/datastore.gostore-client/pkg/datastore/providers/mongodb/watcher/watch_store.gostore-client/pkg/config/env_loader.go
📚 Learning: 2025-11-24T22:20:48.152Z
Learnt from: CR
Repo: NVIDIA/NVSentinel PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-24T22:20:48.152Z
Learning: Applies to **/*.go : Implement proper shutdown handling with context cancellation in Go code
Applied to files:
store-client/pkg/storewatcher/watch_store.go
📚 Learning: 2025-11-15T17:25:33.481Z
Learnt from: dims
Repo: NVIDIA/NVSentinel PR: 255
File: node-drainer/pkg/config/config.go:131-159
Timestamp: 2025-11-15T17:25:33.481Z
Learning: In node-drainer's token configuration, the ClientName must be consistent between TokenConfigFromEnv() and NewTokenConfig() since it's used as a MongoDB query key for resume token lookups in the store-client watcher implementation.
Applied to files:
store-client/pkg/storewatcher/watch_store.gostore-client/pkg/datastore/providers/mongodb/watcher/watch_store.go
📚 Learning: 2025-12-22T16:16:24.320Z
Learnt from: pteranodan
Repo: NVIDIA/NVSentinel PR: 607
File: client-go/nvgrpc/config_test.go:61-80
Timestamp: 2025-12-22T16:16:24.320Z
Learning: In Go tests across the repository, avoid introducing the testify dependency for simple equality/inequality checks. Use the standard testing package assertions (t.Error, t.Errorf, t.Fatal, etc.) for straightforward checks. Reserve third-party assertion libraries for complex scenarios that require richer diagnostics or expressive matchers.
Applied to files:
store-client/pkg/datastore/providers/mongodb/watcher/watch_store_test.go
📚 Learning: 2025-11-24T22:20:48.152Z
Learnt from: CR
Repo: NVIDIA/NVSentinel PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-24T22:20:48.152Z
Learning: Applies to **/*.go : Use structured logging via `log/slog` in Go code
Applied to files:
store-client/pkg/datastore/providers/postgresql/datastore.go
🧬 Code graph analysis (4)
store-client/pkg/factory/client_factory.go (1)
node-drainer/pkg/config/config.go (1)
Duration(37-39)
store-client/pkg/datastore/providers/mongodb/watcher/watch_store_test.go (2)
store-client/pkg/datastore/providers/mongodb/watcher/watch_store.go (2)
ChangeStreamWatcher(89-108)Event(40-40)store-client/pkg/storewatcher/watch_store.go (1)
ChangeStreamWatcher(43-57)
store-client/pkg/datastore/providers/postgresql/datastore.go (1)
store-client/pkg/datastore/types.go (2)
ConnectionConfig(42-54)DataStoreConfig(35-39)
store-client/pkg/config/env_loader.go (1)
store-client/pkg/datastore/providers/mongodb/watcher/watch_store.go (2)
DefaultMaxPoolSize(50-50)DefaultMinPoolSize(51-51)
🔇 Additional comments (20)
store-client/pkg/factory/client_factory.go (1)
227-235: LGTM: Safe pool size conversion with bounds checking.The helper appropriately caps pool sizes at 1000 to prevent resource exhaustion from misconfiguration and safely converts uint64 to int.
store-client/pkg/config/env_loader.go (4)
34-37: LGTM: Well-structured pool configuration interface and constants.The new interface methods, environment variable constants, and defaults align with the PR objectives (MaxPoolSize: 3, MinPoolSize: 1, MaxConnIdleTime: 300s). The naming is clear and consistent with existing patterns.
Also applies to: 77-80, 94-96
196-222: LGTM: Clean helper functions with good error handling.The extraction of
getRequiredEnvandgetCollectionNamereduces code duplication and improves maintainability. Error messages are clear and actionable.
479-497: LGTM: Proper validation for pool size values.The function correctly validates that pool sizes are positive (non-zero) uint64 values and provides appropriate error messages. This prevents configuration issues with invalid pool sizes.
588-610: LGTM: Getters with safe default fallback logic.The getter implementations properly return configured defaults when values are zero, ensuring pool settings are always valid. This defensive approach prevents runtime issues from uninitialized configuration.
store-client/pkg/datastore/providers/mongodb/watcher/watch_store_test.go (2)
192-197: LGTM: Proper test cleanup to avoid double-disconnect.Setting
watcher.client = nilbefore callingClose()is the correct approach when using mtest, which manages the MongoDB client lifecycle. The inline comments clearly explain the rationale, preventing future confusion.Also applies to: 295-298, 360-363, 432-435, 1619-1622, 1769-1772, 1851-1854
1860-1888: LGTM: Comprehensive edge case coverage.The new test explicitly verifies that
Close()handles a nil client gracefully, which is an important edge case given the mtest lifecycle management. This improves test robustness and documents the expected behavior.store-client/pkg/storewatcher/watch_store.go (4)
76-81: Excellent: Prevents connection leaks during initialization errors.The
disconnectOnErrorhelper is consistently applied across all error paths inNewChangeStreamWatcher, ensuring the MongoDB client is properly disconnected when initialization fails. This directly addresses the connection accumulation issue described in the PR objectives.Also applies to: 84-96, 105-106, 115-116, 149-152, 158-159
383-396: Excellent: Proper client disconnection during shutdown.Adding
client.Disconnect()in theClose()method releases MongoDB connections during normal shutdown, complementing the error-path cleanup. The nil-check prevents panics, and the error handling preserves the original error when both operations fail.
552-570: LGTM: Pool configuration applied with proper logging.The connection pool settings (MaxPoolSize, MinPoolSize, MaxConnIdleTime) are correctly applied from the config, with AppName support for easier identification in MongoDB logs and
db.currentOp(). This aligns perfectly with the PR objectives.
462-467: LGTM: Consistent cleanup pattern in GetCollectionClient.The same
disconnectOnErrorpattern is applied inGetCollectionClient, maintaining consistency withNewChangeStreamWatcherand ensuring connection cleanup on all error paths.Also applies to: 470-481, 491-492
store-client/pkg/datastore/providers/postgresql/datastore.go (3)
40-59: LGTM: Centralized validation improves code quality.Extracting configuration validation into a dedicated function reduces duplication and makes the validation logic easier to maintain and test. The port range validation (1-65535) is correctly implemented.
61-92: LGTM: Robust pool option parsing with defensive defaults.The
getPoolOptionhelper properly handles invalid values by logging warnings and falling back to defaults. The positive-value check prevents configuration errors from causing runtime issues. This approach aligns with Go best practices for configuration management.
114-122: LGTM: PostgreSQL pool settings match MongoDB defaults.The pool configuration (defaults: maxOpenConns=3, maxIdleConns=1, maxConnIdleTime=300s) mirrors the MongoDB settings established elsewhere in this PR, providing consistency across database providers. The 1-hour
ConnMaxLifetimeis reasonable for long-running connections.store-client/pkg/datastore/providers/mongodb/watcher/watch_store.go (6)
48-53: LGTM!The connection pool defaults are well-chosen for the stated use case (1 change stream + 1 query + 1 write concurrently). Exporting these constants allows consumers to reference them if needed.
70-74: LGTM!The new configuration fields use correct types matching the MongoDB driver's expectations (
uint64for pool sizes,time.Durationfor idle time). Zero values appropriately trigger defaults inconstructMongoClientOptions.
127-132: LGTM!The
disconnectOnErrorhelper correctly prevents connection leaks during initialization failures. Using the same context for disconnect is a reasonable trade-off - if the context is cancelled, the disconnect attempt is still made with best effort.
134-147: LGTM!All configuration validation error paths correctly invoke
disconnectOnError()before returning, ensuring no connection leaks when configuration is invalid.
155-167: LGTM!Connectivity check failures correctly trigger
disconnectOnError()to release the established connection before returning the error.
198-211: LGTM!Both the resume token retrieval failure (non-
ErrNoDocumentscases) and change stream open failure paths correctly calldisconnectOnError()to prevent connection leaks.
Merging this branch changes the coverage (4 decrease, 1 increase)
Coverage by fileChanged files (no unit tests)
Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code. Changed unit test files
|
🛡️ CodeQL Analysis🚨 Found 1 security alert(s) |
Summary
Issue => #645
it has been observed that in large clusters of ~1000 nodes or more , mongodb is using more memory than small connection of 100 nodes or less. The explanation is it needs to manage more connections with different components of nvsentinel(platform-connectors, fault-x modules etc). Since, issue is not getting reproduced, for now, as part of the fix , optimize and improve the connection management of different nvsentinel modules with MongoDB.
Added MaxConnIdleTime (5 minutes default) - Idle connections are now automatically closed after 5 minutes of inactivity, preventing connection accumulation that previously led to memory exhaustion in large clusters.
Added AppName to connection strings - Each service now identifies itself when connecting to MongoDB (e.g., "platform-connector", "fault-quarantine"), making it easy to identify connection sources in db.currentOp() and MongoDB logs. Prior to this change, the app-name was coming as unknown which causing confusion on where connections are coming from. This will be helpful in debugging and investigation in future if needed.
Added client.Disconnect() on shutdown - ChangeStreamWatcher.Close() now properly disconnects the MongoDB client, ensuring connections are released back to the server instead of remaining orphaned.
Added disconnectOnError() on initialization failures - If NewChangeStreamWatcher() or GetCollectionClient() fails after establishing a connection, the client is now properly disconnected to prevent connection leaks on error paths.
Added configurable connection pool limits - MaxPoolSize (default: 3 Enough for: 1 change stream + 1 query + 1 write operation concurrently) and MinPoolSize (default: 1) are now configurable via Helm values.
to fix golang-lint issues, had done some refactoring as well in the code.
Type of Change
Component(s) Affected
Testing
Checklist
Summary by CodeRabbit
New Features
Improvements
Tests
✏️ Tip: You can customize this high-level summary in your review settings.