Skip to content

Conversation

@deesharma24
Copy link
Contributor

@deesharma24 deesharma24 commented Jan 8, 2026

Summary

Issue => #645

it has been observed that in large clusters of ~1000 nodes or more , mongodb is using more memory than small connection of 100 nodes or less. The explanation is it needs to manage more connections with different components of nvsentinel(platform-connectors, fault-x modules etc). Since, issue is not getting reproduced, for now, as part of the fix , optimize and improve the connection management of different nvsentinel modules with MongoDB.

  1. Added MaxConnIdleTime (5 minutes default) - Idle connections are now automatically closed after 5 minutes of inactivity, preventing connection accumulation that previously led to memory exhaustion in large clusters.

  2. Added AppName to connection strings - Each service now identifies itself when connecting to MongoDB (e.g., "platform-connector", "fault-quarantine"), making it easy to identify connection sources in db.currentOp() and MongoDB logs. Prior to this change, the app-name was coming as unknown which causing confusion on where connections are coming from. This will be helpful in debugging and investigation in future if needed.

  3. Added client.Disconnect() on shutdown - ChangeStreamWatcher.Close() now properly disconnects the MongoDB client, ensuring connections are released back to the server instead of remaining orphaned.

  4. Added disconnectOnError() on initialization failures - If NewChangeStreamWatcher() or GetCollectionClient() fails after establishing a connection, the client is now properly disconnected to prevent connection leaks on error paths.

  5. Added configurable connection pool limits - MaxPoolSize (default: 3 Enough for: 1 change stream + 1 query + 1 write operation concurrently) and MinPoolSize (default: 1) are now configurable via Helm values.

  6. to fix golang-lint issues, had done some refactoring as well in the code.

Type of Change

  • 🐛 Bug fix
  • ✨ New feature
  • 💥 Breaking change
  • 📚 Documentation
  • 🔧 Refactoring
  • 🔨 Build/CI

Component(s) Affected

  • Core Services
  • Documentation/CI
  • Fault Management
  • Health Monitors
  • Janitor
  • Other: ____________

Testing

  • Tests pass locally
  • Manual testing completed
  • No breaking changes (or documented)

Checklist

  • Self-review completed
  • Documentation updated (if needed)
  • Ready for review

Summary by CodeRabbit

  • New Features

    • Added MONGODB_APP_NAME across services and new MongoDB pool settings (max/min pool size, max idle time) configurable via env/values.
  • Improvements

    • MongoDB and PostgreSQL clients now honor pool settings; app-name is included in client setup and logs.
    • Improved cleanup: clients are disconnected on init/shutdown and on error to avoid leaks.
  • Tests

    • Added tests for adapter defaults and safe watcher shutdown.

✏️ Tip: You can customize this high-level summary in your review settings.

@deesharma24 deesharma24 marked this pull request as draft January 8, 2026 11:56
@deesharma24 deesharma24 marked this pull request as ready for review January 8, 2026 11:58
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 8, 2026

📝 Walkthrough

Walkthrough

Adds MongoDB app-name and connection-pool configuration exposure across Helm charts and environment loaders; wires those values into MongoDB and PostgreSQL client options; introduces proactive disconnect/cleanup on MongoDB initialization and shutdown; updates adapters, tests, and related config structs.

Changes

Cohort / File(s) Summary
Helm: deployment templates
distros/kubernetes/nvsentinel/charts/csp-health-monitor/templates/deployment.yaml, distros/kubernetes/nvsentinel/charts/event-exporter/templates/deployment.yaml, distros/kubernetes/nvsentinel/charts/fault-quarantine/templates/deployment.yaml, distros/kubernetes/nvsentinel/charts/fault-remediation/templates/deployment.yaml, distros/kubernetes/nvsentinel/charts/health-events-analyzer/templates/deployment.yaml, distros/kubernetes/nvsentinel/charts/node-drainer/templates/deployment.yaml
Add MONGODB_APP_NAME env var to containers (chart name or specific value) for MongoDB connection identification.
Helm: daemonset & configmap
distros/kubernetes/nvsentinel/templates/daemonset.yaml, distros/kubernetes/nvsentinel/templates/configmap-datastore.yaml
Add MONGODB_APP_NAME to platform-connector DaemonSet; add MONGODB_MAX_POOL_SIZE, MONGODB_MIN_POOL_SIZE, MONGODB_MAX_CONN_IDLE_TIME_SECONDS to datastore ConfigMap when provider is MongoDB or unspecified.
Helm values
distros/kubernetes/nvsentinel/values.yaml
Introduce global.mongodb pool settings: maxPoolSize, minPoolSize, maxConnIdleTimeSeconds with defaults.
Env/config loader
store-client/pkg/config/env_loader.go
Extend DatabaseConfig interface and StandardDatabaseConfig with AppName and pool fields; add env constants, defaults, parsing helpers, and getters to populate these values.
Legacy adapter & tests
store-client/pkg/adapter/legacy_adapter.go, store-client/pkg/adapter/legacy_adapter_test.go
Add getters on legacy adapter for app/pool values and tests validating defaults and conversion behavior.
MongoDB client wiring
store-client/pkg/client/mongodb_client.go
Apply AppName, MaxPoolSize, MinPoolSize, and MaxConnIdleTime to MongoDB client options (converts seconds → duration).
MongoDB watcher / storewatcher
store-client/pkg/datastore/providers/mongodb/watcher/watch_store.go, store-client/pkg/storewatcher/watch_store.go, store-client/pkg/storewatcher/config.go, store-client/pkg/datastore/providers/mongodb/watcher/watch_store_test.go, store-client/pkg/storewatcher/watch_store_test.go
Add AppName and pool fields to configs; apply pool defaults to client options; add disconnectOnError cleanup helper; ensure clients are disconnected on initialization errors and on Close; adjust tests for nil-client/mtest lifecycle.
PostgreSQL pool & factory
store-client/pkg/datastore/providers/postgresql/datastore.go, store-client/pkg/factory/client_factory.go
Replace hard-coded Postgres pool values with runtime-configurable parsing and application of max open/idle connections and idle timeout; add helpers to bound/parse values.
Node reconciler test
node-drainer/pkg/reconciler/reconciler_integration_test.go
Extend mockDatabaseConfig with GetAppName, GetMaxPoolSize, GetMinPoolSize, GetMaxConnIdleTimeSeconds for tests.

Sequence Diagram(s)

sequenceDiagram
  participant Helm as Helm / Env
  participant Loader as Config Loader
  participant Watcher as ChangeStreamWatcher
  participant Client as MongoDB Client
  participant DB as MongoDB Server

  Helm->>Loader: expose MONGODB_APP_NAME, MAX_POOL, MIN_POOL, IDLE_SEC
  Loader->>Watcher: provide MongoDBConfig (AppName, MaxPool, MinPool, MaxConnIdleTime)
  Watcher->>Client: construct options (SetAppName, SetMaxPoolSize, SetMinPoolSize, SetMaxConnIdleTime)
  Client->>DB: Connect()
  DB-->>Client: OK / Error
  alt success
    Client-->>Watcher: ready
  else error
    Watcher->>Client: disconnectOnError -> Disconnect()
    Client-->>Watcher: disconnected
  end
  Note over Watcher,Client: On shutdown
  Watcher->>Client: Close -> Disconnect()
  Client-->>Watcher: disconnected (logged)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I nibbled charts and code beneath the vine,
I named each Mongo thread to mark its line.
Pools set small, idle seconds kept in sight,
I close the client when the startup isn't right.
Hops of glee — connections now behave just fine!

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 68.75% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding MongoDB connection pool management features (MaxConnIdleTime, MaxPoolSize, MinPoolSize, AppName identification, and error-path disconnection) to optimize memory usage in large clusters.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.5.0)

level=error msg="[linters_context] typechecking error: pattern ./...: directory prefix . does not contain main module or its selected dependencies"


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
distros/kubernetes/nvsentinel/templates/daemonset.yaml (1)

23-29: Add nodeAffinity to separate DaemonSet deployments for kata and regular nodes, and configure appropriate volume mounts for logging.

This template currently has no affinity configuration to distinguish between kata and regular nodes. According to the coding guidelines, separate DaemonSets (or a single template with nodeAffinity) should be created using nodeAffinity based on the kata.enabled label.

Additionally, the volume mounts are incomplete:

  • Regular node DaemonSets require /var/log volume mount for file-based logs
  • Kata node DaemonSets require /run/log/journal and /var/log/journal volume mounts for systemd journal

While the values.yaml is configured to detect kata nodes, the daemonset template needs to be enhanced with conditional affinity rules and appropriate volume mounts based on the kata detection result.

🧹 Nitpick comments (7)
store-client/pkg/config/env_loader.go (1)

93-97: Consider consolidating duplicate default constants.

These default constants are also defined in store-client/pkg/datastore/providers/mongodb/watcher/watch_store.go (Lines 49-50). Having duplicate definitions could lead to drift if one is updated without the other.

Consider exporting these constants from a single location (e.g., this file) and importing them in watch_store.go to ensure consistency:

-// Connection pool defaults
-DefaultMaxPoolSize            = 3
-DefaultMinPoolSize            = 1
-DefaultMaxConnIdleTimeSeconds = 300 // 5 minutes
+// Connection pool defaults - exported for use by other packages
+const (
+    DefaultMaxPoolSize            = 3
+    DefaultMinPoolSize            = 1
+    DefaultMaxConnIdleTimeSeconds = 300 // 5 minutes
+)

Then in watch_store.go, import from this package instead of defining locally.

store-client/pkg/datastore/providers/mongodb/watcher/watch_store_test.go (1)

1845-1873: Strengthen panic prevention verification.

The test verifies that Close() handles a nil client, but it doesn't explicitly assert that no panic occurs. Consider wrapping the Close call in a defer/recover block or using require.NotPanics() to make the test's intent clearer.

✨ Proposed enhancement
 	mt.Run("Close handles nil client gracefully", func(mt *mtest.T) {
 		mt.AddMockResponses(
 			mtest.CreateCursorResponse(1, "testdb.testcollection", mtest.FirstBatch),
 			mtest.CreateCursorResponse(0, "testdb.testcollection", mtest.NextBatch),
 		)
 
 		coll := mt.Client.Database("testdb").Collection("testcollection")
 		changeStream, err := coll.Watch(context.Background(), mongo.Pipeline{})
 		require.NoError(mt, err)
 
 		watcher := &ChangeStreamWatcher{
 			client:         nil, // Explicitly nil - edge case
 			changeStream:   changeStream,
 			eventChannel:   make(chan Event, 10),
 			resumeTokenCol: mt.Client.Database("tokendb").Collection("tokencollection"),
 			clientName:     "testclient",
 			done:           make(chan struct{}),
 		}
 
-		// Close should not panic with nil client
-		_ = watcher.Close(context.Background())
+		// Close should not panic with nil client
+		require.NotPanics(mt, func() {
+			_ = watcher.Close(context.Background())
+		})
 	})
store-client/pkg/factory/client_factory.go (1)

90-107: Consider making ConnMaxLifetime configurable.

The PostgreSQL connection pool configuration correctly applies settings to prevent idle connection accumulation. Default values (MaxPoolSize: 3, MinPoolSize: 1, MaxConnIdleTime: 5 minutes) align with MongoDB settings and PR objectives.

However, ConnMaxLifetime is hardcoded to 1 hour. While this is a reasonable default, consider making it configurable through the database config interface for consistency with other pool settings.

♻️ Proposed enhancement

Add a new method to the DatabaseConfig interface and implement it similarly to other pool settings:

In store-client/pkg/config/env_loader.go:

GetMaxConnLifetimeSeconds() int

Then use it here:

 	db.SetMaxOpenConns(int(maxPoolSize))
 	db.SetMaxIdleConns(int(minPoolSize))
 	db.SetConnMaxIdleTime(maxConnIdleTime)
-	db.SetConnMaxLifetime(time.Hour) // Max lifetime of a connection
+	maxConnLifetime := time.Duration(f.dbConfig.GetMaxConnLifetimeSeconds()) * time.Second
+	if maxConnLifetime == 0 {
+		maxConnLifetime = time.Hour // Default
+	}
+	db.SetConnMaxLifetime(maxConnLifetime)
distros/kubernetes/nvsentinel/values.yaml (1)

42-46: Enhance inline documentation for MongoDB pool settings.

The pool configuration values lack inline comments explaining their purpose and rationale. As per coding guidelines, all Helm values should be documented with inline comments, and non-obvious configurations should include examples.

Consider documenting:

  • The purpose of each setting (e.g., "Maximum number of connections per client")
  • The rationale for the defaults (PR objectives mention: "support 1 change stream + 1 query + 1 write concurrently")
  • When users might want to adjust these values
📝 Suggested documentation enhancement
-  # MongoDB connection pool settings
+  # MongoDB connection pool settings
+  # These defaults support 1 change stream + 1 query + 1 write operation concurrently per service
   mongodb:
+    # Maximum number of MongoDB connections per client/service
     maxPoolSize: 3
+    # Minimum number of idle connections to keep warm
     minPoolSize: 1
+    # Close idle connections after this many seconds (5 minutes)
     maxConnIdleTimeSeconds: 300

Based on coding guidelines.

store-client/pkg/datastore/providers/postgresql/datastore.go (1)

72-100: Consider logging parse errors and validating pool size relationships.

The silent fallback on parse errors (lines 80-94) is safe but could hide configuration mistakes. Additionally, there's no validation that maxIdleConns <= maxOpenConns, which is a PostgreSQL pool requirement.

Consider:

  1. Logging a warning when override parsing fails, so operators can detect typos
  2. Validating that maxIdleConns <= maxOpenConns after applying overrides
♻️ Example enhancement
 	if config.Options != nil {
 		if v, ok := config.Options["maxOpenConns"]; ok {
 			if parsed, err := strconv.Atoi(v); err == nil && parsed > 0 {
 				maxOpenConns = parsed
+			} else {
+				slog.Warn("Invalid maxOpenConns override, using default", "value", v, "default", maxOpenConns)
 			}
 		}
 		if v, ok := config.Options["maxIdleConns"]; ok {
 			if parsed, err := strconv.Atoi(v); err == nil && parsed > 0 {
 				maxIdleConns = parsed
+			} else {
+				slog.Warn("Invalid maxIdleConns override, using default", "value", v, "default", maxIdleConns)
 			}
 		}
 		if v, ok := config.Options["maxConnIdleTimeSeconds"]; ok {
 			if parsed, err := strconv.Atoi(v); err == nil && parsed > 0 {
 				maxConnIdleTime = time.Duration(parsed) * time.Second
+			} else {
+				slog.Warn("Invalid maxConnIdleTimeSeconds override, using default", "value", v, "default", int(maxConnIdleTime.Seconds()))
 			}
 		}
 	}
+
+	// Validate pool configuration
+	if maxIdleConns > maxOpenConns {
+		slog.Warn("maxIdleConns exceeds maxOpenConns, capping to maxOpenConns", 
+			"maxIdleConns", maxIdleConns, "maxOpenConns", maxOpenConns)
+		maxIdleConns = maxOpenConns
+	}
 
 	db.SetMaxOpenConns(maxOpenConns)
store-client/pkg/adapter/legacy_adapter_test.go (1)

22-23: Consider using standard testing package for simple equality checks.

Based on learnings, the codebase prefers the standard testing package for simple equality checks, reserving third-party libraries like testify for complex scenarios requiring richer diagnostics.

♻️ Example using standard testing package
-	"github.com/stretchr/testify/assert"
-	"github.com/stretchr/testify/require"

Then replace assertions like:

-	assert.Equal(t, "", adapter.GetAppName(), "AppName should be empty for legacy adapter")
+	if got := adapter.GetAppName(); got != "" {
+		t.Errorf("AppName should be empty for legacy adapter, got %q", got)
+	}
-	assert.Equal(t, uint64(3), adapter.GetMaxPoolSize(), "MaxPoolSize default should be 3")
+	if got := adapter.GetMaxPoolSize(); got != 3 {
+		t.Errorf("MaxPoolSize default should be 3, got %d", got)
+	}

Based on learnings.

store-client/pkg/storewatcher/watch_store.go (1)

551-566: Consider extracting default values as constants for consistency.

The defaults here use magic numbers while store-client/pkg/datastore/providers/mongodb/watcher/watch_store.go defines them as exported constants (DefaultMaxPoolSize, DefaultMinPoolSize, DefaultMaxConnIdleTime). Consider importing and reusing those constants to ensure consistency across packages.

♻️ Suggested refactor

Either import the constants from the other package or define local constants:

+const (
+	defaultMaxPoolSize     = 3
+	defaultMinPoolSize     = 1
+	defaultMaxConnIdleTime = 5 * time.Minute
+)
+
 // Apply connection pool settings to prevent idle connection accumulation
 // These are critical for large clusters with many clients (e.g., 1000+ platform-connectors)
 maxPoolSize := mongoConfig.MaxPoolSize
 if maxPoolSize == 0 {
-    maxPoolSize = 3 // Default max pool size
+    maxPoolSize = defaultMaxPoolSize
 }

 minPoolSize := mongoConfig.MinPoolSize
 if minPoolSize == 0 {
-    minPoolSize = 1 // Default min pool size
+    minPoolSize = defaultMinPoolSize
 }

 maxConnIdleTime := time.Duration(mongoConfig.MaxConnIdleTimeSeconds) * time.Second
 if maxConnIdleTime == 0 {
-    maxConnIdleTime = 5 * time.Minute // Default: 5 minutes
+    maxConnIdleTime = defaultMaxConnIdleTime
 }
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d9b8056 and 67611de.

📒 Files selected for processing (20)
  • distros/kubernetes/nvsentinel/charts/csp-health-monitor/templates/deployment.yaml
  • distros/kubernetes/nvsentinel/charts/event-exporter/templates/deployment.yaml
  • distros/kubernetes/nvsentinel/charts/fault-quarantine/templates/deployment.yaml
  • distros/kubernetes/nvsentinel/charts/fault-remediation/templates/deployment.yaml
  • distros/kubernetes/nvsentinel/charts/health-events-analyzer/templates/deployment.yaml
  • distros/kubernetes/nvsentinel/charts/node-drainer/templates/deployment.yaml
  • distros/kubernetes/nvsentinel/templates/configmap-datastore.yaml
  • distros/kubernetes/nvsentinel/templates/daemonset.yaml
  • distros/kubernetes/nvsentinel/values.yaml
  • node-drainer/pkg/reconciler/reconciler_integration_test.go
  • store-client/pkg/adapter/legacy_adapter.go
  • store-client/pkg/adapter/legacy_adapter_test.go
  • store-client/pkg/client/mongodb_client.go
  • store-client/pkg/config/env_loader.go
  • store-client/pkg/datastore/providers/mongodb/watcher/watch_store.go
  • store-client/pkg/datastore/providers/mongodb/watcher/watch_store_test.go
  • store-client/pkg/datastore/providers/postgresql/datastore.go
  • store-client/pkg/factory/client_factory.go
  • store-client/pkg/storewatcher/config.go
  • store-client/pkg/storewatcher/watch_store.go
🧰 Additional context used
📓 Path-based instructions (5)
**/daemonset*.yaml

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Explain DaemonSet variant selection logic in Helm chart documentation

Files:

  • distros/kubernetes/nvsentinel/templates/daemonset.yaml
distros/kubernetes/**/*daemonset*.yaml

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

distros/kubernetes/**/*daemonset*.yaml: Separate DaemonSets should be created for kata vs regular nodes using nodeAffinity based on kata.enabled label
Regular node DaemonSets should use /var/log volume mount for file-based logs
Kata node DaemonSets should use /run/log/journal and /var/log/journal volume mounts for systemd journal

Files:

  • distros/kubernetes/nvsentinel/templates/daemonset.yaml
**/*.go

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.go: Follow standard Go conventions with gofmt and golint
Use structured logging via log/slog in Go code
Wrap errors with context using fmt.Errorf("context: %w", err) in Go code
Within retry.RetryOnConflict blocks, return errors without wrapping to preserve retry behavior
Use meaningful variable names such as synced over ok for cache sync checks
Use client-go for Kubernetes API interactions in Go code
Prefer informers over direct API calls for watching Kubernetes resources
Implement proper shutdown handling with context cancellation in Go code
Package-level godoc required for all Go packages
Function comments required for all exported Go functions
Use inline comments for complex logic only in Go code
TODO comments should reference issues in Go code
Extract informer event handler setup into helper methods
Use separate informers for different Kubernetes resource types

Files:

  • store-client/pkg/datastore/providers/mongodb/watcher/watch_store_test.go
  • node-drainer/pkg/reconciler/reconciler_integration_test.go
  • store-client/pkg/storewatcher/watch_store.go
  • store-client/pkg/adapter/legacy_adapter_test.go
  • store-client/pkg/adapter/legacy_adapter.go
  • store-client/pkg/client/mongodb_client.go
  • store-client/pkg/factory/client_factory.go
  • store-client/pkg/datastore/providers/mongodb/watcher/watch_store.go
  • store-client/pkg/datastore/providers/postgresql/datastore.go
  • store-client/pkg/storewatcher/config.go
  • store-client/pkg/config/env_loader.go
**/*_test.go

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*_test.go: Use envtest for testing Kubernetes controllers instead of fake clients
Use testify/assert and testify/require for assertions in Go tests
Write table-driven tests when testing multiple scenarios in Go
Name Go tests descriptively using format: TestFunctionName_Scenario_ExpectedBehavior

Files:

  • store-client/pkg/datastore/providers/mongodb/watcher/watch_store_test.go
  • node-drainer/pkg/reconciler/reconciler_integration_test.go
  • store-client/pkg/adapter/legacy_adapter_test.go
**/values.yaml

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/values.yaml: Document all values in Helm chart values.yaml with inline comments
Include examples for non-obvious configurations in Helm chart documentation
Note truthy value requirements in Helm chart documentation where applicable

Files:

  • distros/kubernetes/nvsentinel/values.yaml
🧠 Learnings (4)
📚 Learning: 2025-12-22T16:16:24.320Z
Learnt from: pteranodan
Repo: NVIDIA/NVSentinel PR: 607
File: client-go/nvgrpc/config_test.go:61-80
Timestamp: 2025-12-22T16:16:24.320Z
Learning: In Go tests across the repository, avoid introducing the testify dependency for simple equality/inequality checks. Use the standard testing package assertions (t.Error, t.Errorf, t.Fatal, etc.) for straightforward checks. Reserve third-party assertion libraries for complex scenarios that require richer diagnostics or expressive matchers.

Applied to files:

  • store-client/pkg/datastore/providers/mongodb/watcher/watch_store_test.go
  • node-drainer/pkg/reconciler/reconciler_integration_test.go
  • store-client/pkg/adapter/legacy_adapter_test.go
📚 Learning: 2025-12-23T05:02:22.108Z
Learnt from: tanishagoyal2
Repo: NVIDIA/NVSentinel PR: 609
File: store-client/pkg/client/postgresql_pipeline_builder.go:119-132
Timestamp: 2025-12-23T05:02:22.108Z
Learning: In the NVSentinel codebase, protobuf fields stored in MongoDB should use lowercase field names (e.g., processingstrategy, componentclass, checkname). Ensure pipeline filters and queries that access protobuf fields in the database consistently use lowercase field names in the store-client package, avoiding camelCase mappings for MongoDB reads/writes.

Applied to files:

  • store-client/pkg/datastore/providers/mongodb/watcher/watch_store_test.go
  • store-client/pkg/storewatcher/watch_store.go
  • store-client/pkg/adapter/legacy_adapter_test.go
  • store-client/pkg/adapter/legacy_adapter.go
  • store-client/pkg/client/mongodb_client.go
  • store-client/pkg/factory/client_factory.go
  • store-client/pkg/datastore/providers/mongodb/watcher/watch_store.go
  • store-client/pkg/datastore/providers/postgresql/datastore.go
  • store-client/pkg/storewatcher/config.go
  • store-client/pkg/config/env_loader.go
📚 Learning: 2025-11-15T17:25:33.481Z
Learnt from: dims
Repo: NVIDIA/NVSentinel PR: 255
File: node-drainer/pkg/config/config.go:131-159
Timestamp: 2025-11-15T17:25:33.481Z
Learning: In node-drainer's token configuration, the ClientName must be consistent between TokenConfigFromEnv() and NewTokenConfig() since it's used as a MongoDB query key for resume token lookups in the store-client watcher implementation.

Applied to files:

  • store-client/pkg/storewatcher/watch_store.go
  • store-client/pkg/datastore/providers/mongodb/watcher/watch_store.go
  • store-client/pkg/storewatcher/config.go
📚 Learning: 2025-11-24T22:20:48.152Z
Learnt from: CR
Repo: NVIDIA/NVSentinel PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-24T22:20:48.152Z
Learning: Applies to **/*.go : Use structured logging via `log/slog` in Go code

Applied to files:

  • store-client/pkg/client/mongodb_client.go
  • store-client/pkg/datastore/providers/postgresql/datastore.go
🧬 Code graph analysis (5)
store-client/pkg/datastore/providers/mongodb/watcher/watch_store_test.go (2)
store-client/pkg/datastore/providers/mongodb/watcher/watch_store.go (2)
  • ChangeStreamWatcher (89-108)
  • Event (40-40)
store-client/pkg/storewatcher/watch_store.go (1)
  • ChangeStreamWatcher (43-57)
store-client/pkg/adapter/legacy_adapter_test.go (3)
store-client/pkg/datastore/types.go (3)
  • DataStoreConfig (35-39)
  • ProviderMongoDB (25-25)
  • ConnectionConfig (42-54)
store-client/pkg/adapter/legacy_adapter.go (3)
  • NewLegacyDatabaseConfigAdapter (48-52)
  • ConvertDataStoreConfigToLegacy (27-29)
  • ConvertDataStoreConfigToLegacyWithCertPath (33-38)
store-client/pkg/config/env_loader.go (1)
  • DatabaseConfig (28-38)
store-client/pkg/client/mongodb_client.go (1)
node-drainer/pkg/config/config.go (1)
  • Duration (37-39)
store-client/pkg/factory/client_factory.go (1)
node-drainer/pkg/config/config.go (1)
  • Duration (37-39)
store-client/pkg/config/env_loader.go (1)
store-client/pkg/datastore/providers/mongodb/watcher/watch_store.go (2)
  • DefaultMaxPoolSize (50-50)
  • DefaultMinPoolSize (51-51)
🔇 Additional comments (26)
node-drainer/pkg/reconciler/reconciler_integration_test.go (1)

138-152: LGTM - Mock methods correctly implement the new DatabaseConfig interface.

The mock implementations return sensible defaults that align with the PR's connection pool configuration (MaxPoolSize=3, MinPoolSize=1, MaxConnIdleTime=300s). These values match the defaults defined in env_loader.go.

store-client/pkg/config/env_loader.go (2)

356-372: Verify intent: PostgreSQL using MONGODB_ prefixed environment variables.

The PostgreSQL configuration path uses MONGODB_* prefixed environment variables for pool settings. While the comment indicates this is intentional for consistency, operators deploying with PostgreSQL may expect DATASTORE_* or POSTGRESQL_* prefixed variables.

Is this intentional, or should PostgreSQL have its own environment variable names (e.g., DATASTORE_MAX_POOL_SIZE) for clarity?


550-569: LGTM - Defensive fallback for uninitialized configuration.

The zero-value checks provide good fallback behavior for cases where the struct might not be fully initialized (e.g., created via struct literal without these fields). This is safe because getPositiveIntEnvVar already rejects zero/negative values during normal initialization.

store-client/pkg/client/mongodb_client.go (2)

347-351: LGTM - Connection pool settings correctly wired to MongoDB client.

The pool configuration is properly extracted from the database config and passed to the MongoDB watcher. The time conversion for MaxConnIdleTime is correct.


750-769: Verify: NewChangeStreamWatcher missing pool configuration settings.

The mongoConfig created for NewChangeStreamWatcher doesn't include the new pool settings (AppName, MaxPoolSize, MinPoolSize, MaxConnIdleTime). This means change stream watchers may not benefit from the connection pool optimizations introduced in this PR.

Should these settings be included for consistency?

 mongoConfig := mongoWatcher.MongoDBConfig{
     URI:        c.config.GetConnectionURI(),
     Database:   c.database,
     Collection: c.collection,
     // ...existing TLS and timeout settings...
     CertWatcher:                      c.certWatcher,
+    // Connection pool settings
+    AppName:         c.config.GetAppName(),
+    MaxPoolSize:     c.config.GetMaxPoolSize(),
+    MinPoolSize:     c.config.GetMinPoolSize(),
+    MaxConnIdleTime: time.Duration(c.config.GetMaxConnIdleTimeSeconds()) * time.Second,
 }
distros/kubernetes/nvsentinel/charts/fault-quarantine/templates/deployment.yaml (1)

138-140: LGTM - App name added for MongoDB connection identification.

The MONGODB_APP_NAME environment variable is correctly set to the chart name, enabling connection identification in MongoDB logs and db.currentOp().

Note: This env var is set unconditionally, even when datastore.provider is postgresql. This is harmless (the variable will simply be ignored) but you may want to wrap it in a conditional for cleanliness, similar to other MongoDB-specific settings.

distros/kubernetes/nvsentinel/charts/fault-remediation/templates/deployment.yaml (1)

139-141: LGTM - Consistent with other deployment templates.

The MONGODB_APP_NAME environment variable follows the same pattern as other charts in this PR.

distros/kubernetes/nvsentinel/charts/node-drainer/templates/deployment.yaml (1)

139-141: LGTM - Consistent with test mock and other templates.

The MONGODB_APP_NAME value aligns with the mock implementation in reconciler_integration_test.go which returns "node-drainer".

distros/kubernetes/nvsentinel/charts/event-exporter/templates/deployment.yaml (1)

123-125: LGTM - Consistent app name configuration.

The MONGODB_APP_NAME environment variable follows the same pattern as other charts, enabling connection identification for the event-exporter component.

distros/kubernetes/nvsentinel/templates/daemonset.yaml (1)

148-150: LGTM: MONGODB_APP_NAME addition is correct.

The environment variable addition enables connection identification in MongoDB logs and monitoring tools, which aligns with the PR objectives.

distros/kubernetes/nvsentinel/charts/csp-health-monitor/templates/deployment.yaml (2)

141-143: LGTM: Dynamic app name assignment is correct.

Using .Chart.Name provides a unique identifier per chart deployment, which aligns with the PR's goal of enabling connection identification in MongoDB logs.


203-205: LGTM: Specific app name for maintenance-notifier is appropriate.

The hardcoded value "maintenance-notifier" clearly identifies this sidecar container's connections, enabling differentiation from the main container in the same pod.

distros/kubernetes/nvsentinel/templates/configmap-datastore.yaml (1)

81-86: LGTM: MongoDB pool settings are well-configured.

The default values align with the PR rationale (support 1 change stream + 1 query + 1 write concurrently), and the 5-minute idle timeout helps prevent idle connection accumulation in large clusters. The conditional logic correctly handles both explicit MongoDB selection and backward compatibility when no provider is specified.

store-client/pkg/factory/client_factory.go (1)

22-22: LGTM: Time import is necessary.

The import is required for time.Duration and time.Second used in the connection pool configuration.

distros/kubernetes/nvsentinel/charts/health-events-analyzer/templates/deployment.yaml (1)

121-123: LGTM!

The addition of MONGODB_APP_NAME is well-documented and correctly uses the chart name for connection identification. This will help with MongoDB diagnostics via db.currentOp() and logs.

store-client/pkg/storewatcher/config.go (1)

47-50: LGTM!

The new pool configuration fields are well-structured with sensible defaults that align with the Helm values. The environment variable naming follows a consistent convention, and the types are appropriate (uint64 for pool sizes prevents negative values).

store-client/pkg/adapter/legacy_adapter_test.go (1)

28-69: Good test coverage for pool configuration defaults and conversion.

The tests effectively validate:

  1. Interface compliance of the legacy adapter
  2. Correct default pool values (AppName "", MaxPoolSize 3, MinPoolSize 1, MaxConnIdleTimeSeconds 300)
  3. Conversion functions work both with and without certificate paths

Test names follow good conventions and the coverage ensures the adapter correctly exposes the new pool configuration surface.

store-client/pkg/adapter/legacy_adapter.go (1)

161-175: LGTM! Sensible defaults for legacy adapter.

The hardcoded defaults align with the pool configuration constants defined elsewhere in the codebase (MaxPoolSize=3, MinPoolSize=1, MaxConnIdleTime=5 minutes). Returning an empty AppName is appropriate for the legacy adapter since services using the new configuration path will set their own application names.

store-client/pkg/storewatcher/watch_store.go (3)

76-81: Good addition of cleanup helper.

The disconnectOnError helper effectively prevents connection leaks during initialization failures. The pattern of capturing client and ctx from the outer scope is clean and ensures all error paths properly release resources.


382-395: LGTM! Proper disconnect handling in Close().

The disconnect logic correctly:

  1. Checks for nil client before attempting disconnect
  2. Logs warnings on disconnect failure without overriding the original error from changeStream.Close()
  3. Propagates the disconnect error only when no prior error exists

This ensures connections are properly released during shutdown.


576-580: LGTM! AppName configuration for connection identification.

Setting AppName when provided enables easier identification of connections in MongoDB's db.currentOp() and logs, which directly addresses the PR objective of improving visibility in large clusters.

store-client/pkg/datastore/providers/mongodb/watcher/watch_store.go (5)

48-53: LGTM! Well-defined exported constants for connection pool defaults.

These constants provide clear documentation of the default values and enable reuse across the codebase. The rationale in the PR (support 1 change stream + 1 query + 1 write concurrently) justifies the default of 3 for MaxPoolSize.


70-74: LGTM! Config struct extended with pool settings.

Using time.Duration directly for MaxConnIdleTime is a cleaner API than integer seconds, as it's type-safe and self-documenting.


127-132: Good cleanup helper pattern.

The disconnectOnError helper is consistently implemented across both watcher files, ensuring proper resource cleanup on initialization failures.


503-516: LGTM! Proper MongoDB client disconnect on Close().

The implementation correctly:

  1. Guards against nil client
  2. Logs the disconnect outcome for observability
  3. Preserves the original changeStream.Close() error when both operations fail
  4. Only promotes disconnect error to return value when no prior error exists

This ensures connections are properly released when the watcher is closed.


654-685: LGTM! Pool settings applied with proper defaults.

The implementation correctly uses the defined constants as fallbacks and applies all pool settings to the MongoDB client options. Logging the AppName at Info level aids debugging in production environments.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (6)
store-client/pkg/datastore/providers/postgresql/datastore.go (1)

61-74: Silent fallback for invalid values may mask configuration errors.

When parsed <= 0, the function silently returns the default without any logging. This could mask configuration errors where an operator accidentally sets maxOpenConns=0 or a negative value, expecting it to mean "unlimited" or similar.

Consider logging a warning when an explicitly provided value is rejected:

♻️ Suggested improvement
 func getPoolOption(options map[string]string, key string, defaultVal int) int {
 	if options == nil {
 		return defaultVal
 	}

 	if v, ok := options[key]; ok {
 		if parsed, err := strconv.Atoi(v); err == nil && parsed > 0 {
 			return parsed
+		} else if err == nil {
+			slog.Warn("Pool option value must be positive, using default",
+				"key", key, "provided", parsed, "default", defaultVal)
 		}
 	}

 	return defaultVal
 }
store-client/pkg/datastore/providers/mongodb/watcher/watch_store.go (2)

48-53: Duplicate constant definitions with env_loader.go.

These pool defaults are also defined in store-client/pkg/config/env_loader.go (lines 93-96). Consider importing from a shared location to avoid drift.

#!/bin/bash
# Verify duplicate constant definitions
rg -n "DefaultMaxPoolSize|DefaultMinPoolSize|DefaultMaxConnIdleTime" --type=go store-client/

569-624: GetCollectionClient lacks mechanism for caller to disconnect the client.

The disconnectOnError cleanup on error paths is good. However, on the happy path (line 623), the caller receives a *mongo.Collection but has no reference to the underlying *mongo.Client. This means there's no way to call Disconnect() when the collection is no longer needed, which could lead to connection leaks in long-running services.

Consider returning both the collection and client, or providing a cleanup function:

♻️ Suggested approach
// Option 1: Return client along with collection
func GetCollectionClient(...) (*mongo.Collection, *mongo.Client, error)

// Option 2: Return a cleanup function
func GetCollectionClient(...) (*mongo.Collection, func(), error) {
    // ...
    cleanup := func() {
        if disconnectErr := client.Disconnect(context.Background()); disconnectErr != nil {
            slog.Warn("Failed to disconnect client", "error", disconnectErr)
        }
    }
    return collection, cleanup, nil
}
store-client/pkg/config/env_loader.go (3)

76-80: Environment variable naming uses MONGODB_ prefix for both providers.

These env vars have a MONGODB_ prefix but are used for PostgreSQL configuration as well (see line 375). Consider using provider-agnostic names like DATASTORE_APP_NAME, DATASTORE_MAX_POOL_SIZE, etc., for consistency with other PostgreSQL env vars (DATASTORE_HOST, DATASTORE_PORT).

Is the use of MONGODB_* env vars for PostgreSQL configuration intentional for backward compatibility, or should provider-agnostic names be introduced?


93-96: Pool defaults are correctly defined but duplicated in watch_store.go.

These values align with the PR objectives (MaxPoolSize=3 for concurrent change stream + query + write). However, they duplicate constants in store-client/pkg/datastore/providers/mongodb/watcher/watch_store.go (lines 49-52).

Consider extracting these to a shared constants file to prevent drift.


374-401: PostgreSQL config uses MONGODB_ environment variables for pool settings.*

The comment at line 374 acknowledges this applies to PostgreSQL too, but using EnvMongoDBAppName, EnvMongoDBMaxPoolSize, etc., for PostgreSQL configuration could confuse operators.

This is the same issue noted earlier regarding env var naming. If the MONGODB_* naming is intentional for simplicity/backward compatibility, consider adding a comment explaining why.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 67611de and 12df618.

📒 Files selected for processing (5)
  • store-client/pkg/config/env_loader.go
  • store-client/pkg/datastore/providers/mongodb/watcher/watch_store.go
  • store-client/pkg/datastore/providers/postgresql/datastore.go
  • store-client/pkg/factory/client_factory.go
  • store-client/pkg/storewatcher/watch_store.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • store-client/pkg/factory/client_factory.go
  • store-client/pkg/storewatcher/watch_store.go
🧰 Additional context used
📓 Path-based instructions (1)
**/*.go

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.go: Follow standard Go conventions with gofmt and golint
Use structured logging via log/slog in Go code
Wrap errors with context using fmt.Errorf("context: %w", err) in Go code
Within retry.RetryOnConflict blocks, return errors without wrapping to preserve retry behavior
Use meaningful variable names such as synced over ok for cache sync checks
Use client-go for Kubernetes API interactions in Go code
Prefer informers over direct API calls for watching Kubernetes resources
Implement proper shutdown handling with context cancellation in Go code
Package-level godoc required for all Go packages
Function comments required for all exported Go functions
Use inline comments for complex logic only in Go code
TODO comments should reference issues in Go code
Extract informer event handler setup into helper methods
Use separate informers for different Kubernetes resource types

Files:

  • store-client/pkg/datastore/providers/postgresql/datastore.go
  • store-client/pkg/datastore/providers/mongodb/watcher/watch_store.go
  • store-client/pkg/config/env_loader.go
🧠 Learnings (3)
📚 Learning: 2025-11-24T22:20:48.152Z
Learnt from: CR
Repo: NVIDIA/NVSentinel PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-24T22:20:48.152Z
Learning: Applies to **/*.go : Use structured logging via `log/slog` in Go code

Applied to files:

  • store-client/pkg/datastore/providers/postgresql/datastore.go
📚 Learning: 2025-12-23T05:02:22.108Z
Learnt from: tanishagoyal2
Repo: NVIDIA/NVSentinel PR: 609
File: store-client/pkg/client/postgresql_pipeline_builder.go:119-132
Timestamp: 2025-12-23T05:02:22.108Z
Learning: In the NVSentinel codebase, protobuf fields stored in MongoDB should use lowercase field names (e.g., processingstrategy, componentclass, checkname). Ensure pipeline filters and queries that access protobuf fields in the database consistently use lowercase field names in the store-client package, avoiding camelCase mappings for MongoDB reads/writes.

Applied to files:

  • store-client/pkg/datastore/providers/postgresql/datastore.go
  • store-client/pkg/datastore/providers/mongodb/watcher/watch_store.go
  • store-client/pkg/config/env_loader.go
📚 Learning: 2025-11-15T17:25:33.481Z
Learnt from: dims
Repo: NVIDIA/NVSentinel PR: 255
File: node-drainer/pkg/config/config.go:131-159
Timestamp: 2025-11-15T17:25:33.481Z
Learning: In node-drainer's token configuration, the ClientName must be consistent between TokenConfigFromEnv() and NewTokenConfig() since it's used as a MongoDB query key for resume token lookups in the store-client watcher implementation.

Applied to files:

  • store-client/pkg/datastore/providers/mongodb/watcher/watch_store.go
🧬 Code graph analysis (3)
store-client/pkg/datastore/providers/postgresql/datastore.go (1)
store-client/pkg/datastore/types.go (2)
  • ConnectionConfig (42-54)
  • DataStoreConfig (35-39)
store-client/pkg/datastore/providers/mongodb/watcher/watch_store.go (1)
store-client/pkg/config/env_loader.go (2)
  • DefaultMaxPoolSize (94-94)
  • DefaultMinPoolSize (95-95)
store-client/pkg/config/env_loader.go (1)
store-client/pkg/datastore/providers/mongodb/watcher/watch_store.go (2)
  • DefaultMaxPoolSize (50-50)
  • DefaultMinPoolSize (51-51)
🔇 Additional comments (12)
store-client/pkg/datastore/providers/postgresql/datastore.go (2)

40-59: Validation logic looks good.

The function correctly validates required connection parameters and port range. Error messages are clear and actionable.


76-104: Pool configuration and validation integration looks correct.

The changes properly:

  • Validate config before attempting connection
  • Apply pool settings with sensible defaults matching the PR objectives (MaxPoolSize=3, MinPoolSize=1, MaxConnIdleTime=5 min)
  • Close the DB on ping failure (line 91)
  • Use SetConnMaxLifetime(time.Hour) to prevent stale connections

The default values align with the MongoDB pool defaults mentioned in the PR objectives.

store-client/pkg/datastore/providers/mongodb/watcher/watch_store.go (5)

70-74: New pool configuration fields in MongoDBConfig look good.

The new fields align with the MongoDB driver's client options and support the PR objectives for configurable connection pool limits.


127-132: Good pattern for preventing connection leaks during initialization.

The disconnectOnError helper ensures the MongoDB client is properly disconnected when initialization fails after connection. This addresses the PR objective of adding disconnectOnError() to prevent connection leaks on error paths.


134-147: Consistent application of disconnectOnError across all error paths.

All validation and connectivity check failures now properly disconnect the client before returning. This ensures no orphaned connections remain when NewChangeStreamWatcher fails.

Also applies to: 153-168, 198-211


504-517: Client disconnect on Close() addresses PR objectives.

This change ensures client.Disconnect() is called during shutdown, preventing connection accumulation. The implementation correctly:

  • Checks for nil client
  • Logs success/failure
  • Preserves the original error from changeStream.Close() if both fail

655-688: Pool configuration in constructMongoClientOptions is well-implemented.

The implementation correctly:

  • Applies defaults when config values are zero
  • Sets MaxPoolSize, MinPoolSize, and MaxConnIdleTime to prevent idle connection accumulation
  • Conditionally sets AppName for easier identification in db.currentOp() and MongoDB logs, as mentioned in PR objectives

The comments explaining the rationale for these settings are helpful.

store-client/pkg/config/env_loader.go (5)

34-37: Interface extension is backward-compatible and well-designed.

Adding new methods to the DatabaseConfig interface enables consuming components to access pool settings and app name in a database-agnostic way.


196-222: Helper functions improve code clarity.

getRequiredEnv and getCollectionName reduce duplication and provide consistent error messages.


264-291: Pool settings loading is correctly implemented.

The logic properly loads pool settings from environment variables with defaults, and the returned config struct is fully populated.


479-497: getPositiveUint64EnvVar implementation is correct.

The function properly:

  • Returns the default when the env var is unset
  • Parses as uint64 for MongoDB driver compatibility
  • Rejects zero values with a clear error message

588-610: Getter implementations provide defensive defaults.

The getters return sensible defaults if the struct fields are zero. This is good defensive programming for cases where StandardDatabaseConfig is constructed without going through the loading functions (e.g., in tests).

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
store-client/pkg/storewatcher/watch_store.go (2)

373-403: Close() error reporting is misleading when disconnect fails.

If changeStream.Close() succeeds but client.Disconnect() fails, you set err to a disconnect error but then wrap it with failed to close change stream..., which is confusing (and can misdirect operators). Consider tracking closeErr and disconnectErr separately and returning the right one (or errors.Join).


312-340: Make MarkProcessed retries ctx-interruptible (avoid time.Sleep).

time.Sleep(w.resumeTokenUpdateInterval) won’t exit promptly on shutdown/cancel. Use a select with time.After(...) (same pattern you used in openChangeStreamWithRetry).

store-client/pkg/client/mongodb_client.go (1)

17-24: Add validation for pool configuration before applying to MongoDB client.

Currently, MaxConnIdleTime and pool sizes are applied without runtime validation. Add guards to prevent invalid configurations:

  • MaxConnIdleTimeSeconds ≤ 0 should be skipped (use MongoDB default)
  • MinPoolSize > MaxPoolSize should return a validation error

This prevents misconfiguration from reaching the MongoDB driver and causing connection pool failures.

Proposed diff
@@
 		ChangeStreamRetryIntervalSeconds: dbConfig.GetTimeoutConfig().GetChangeStreamRetryIntervalSeconds(),
 		// Connection pool settings to prevent idle connection accumulation
-		AppName:         dbConfig.GetAppName(),
-		MaxPoolSize:     dbConfig.GetMaxPoolSize(),
-		MinPoolSize:     dbConfig.GetMinPoolSize(),
-		MaxConnIdleTime: time.Duration(dbConfig.GetMaxConnIdleTimeSeconds()) * time.Second,
+		AppName:     dbConfig.GetAppName(),
+		MaxPoolSize: dbConfig.GetMaxPoolSize(),
+		MinPoolSize: dbConfig.GetMinPoolSize(),
 	}
+
+	idleSeconds := dbConfig.GetMaxConnIdleTimeSeconds()
+	if idleSeconds > 0 {
+		mongoConfig.MaxConnIdleTime = time.Duration(idleSeconds) * time.Second
+	}
+	if mongoConfig.MaxPoolSize > 0 && mongoConfig.MinPoolSize > mongoConfig.MaxPoolSize {
+		return nil, datastore.NewValidationError(
+			datastore.ProviderMongoDB,
+			"invalid connection pool configuration: minPoolSize cannot exceed maxPoolSize",
+			nil,
+		).WithMetadata("minPoolSize", mongoConfig.MinPoolSize).
+			WithMetadata("maxPoolSize", mongoConfig.MaxPoolSize)
+	}
🤖 Fix all issues with AI agents
In @store-client/pkg/datastore/providers/postgresql/datastore.go:
- Around line 40-59: validatePostgreSQLConfig currently rejects conn.Port == 0
and requires conn.Username, which contradicts buildConnectionString behavior and
lib/pq defaults; update validatePostgreSQLConfig (operating on
datastore.ConnectionConfig) to only require Host and Database, make Username
optional (remove the username empty check), and only validate port range when
conn.Port > 0 (i.e., ensure port <= 65535 when set); adjust returned error
messages accordingly so optional port/username behavior is preserved.

In @store-client/pkg/storewatcher/watch_store.go:
- Around line 76-96: disconnectOnError currently calls client.Disconnect(ctx)
using the caller's ctx which may be canceled/expired; change it to create a
fresh background context with a short timeout (e.g.,
context.WithTimeout(context.Background(), 5s)) and call client.Disconnect on
that cleanup context, ensuring you cancel the timeout when done and log any
disconnect error; apply the same pattern in GetCollectionClient (the
disconnect/cleanup paths) and in the Close method if it uses a caller-supplied
ctx for shutdown.
🧹 Nitpick comments (4)
store-client/pkg/storewatcher/watch_store_test.go (1)

192-197: Tests mutating watcher.client=nil is brittle; prefer an explicit “skip disconnect” hook for testability.

Setting internals to nil works, but it’s easy to miss future tests/call-sites and masks real Close() behavior. Consider adding a ChangeStreamWatcher field like disconnect func(context.Context) error (defaulting to client.Disconnect) or an option flag used only by tests/mtest.

Also applies to: 295-299, 360-364, 432-436

store-client/pkg/adapter/legacy_adapter.go (1)

161-175: Avoid hardcoded pool defaults here; centralize defaults (or read from dsConfig.Options) to prevent drift.

If these defaults are meant to match env/chart defaults, define shared constants (e.g., in store-client/pkg/config) or let the legacy adapter consult l.dsConfig.Options (with the same parsing rules as PostgreSQL).

store-client/pkg/storewatcher/watch_store.go (1)

552-573: Validate pool settings (min/max/idle) before applying; and be careful about logging AppName at info on every init.

Add checks like MinPoolSize <= MaxPoolSize (when MaxPoolSize > 0) and MaxConnIdleTimeSeconds >= 0 to prevent invalid configs. Also consider whether slog.Info for app name should be Debug to reduce log noise across many pods. Based on learnings, prefer structured logging via log/slog.

store-client/pkg/datastore/providers/postgresql/datastore.go (1)

61-74: Consider logging when pool options are present but invalid (today they silently fall back).

If Options[maxOpenConns] is set to a non-int/<=0, silently defaulting can hide misconfig. A one-time slog.Warn during init (include key/value) can save debugging time. Based on learnings, prefer structured logging via log/slog.

Also applies to: 96-105

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 12df618 and e5f35dc.

📒 Files selected for processing (21)
  • distros/kubernetes/nvsentinel/charts/csp-health-monitor/templates/deployment.yaml
  • distros/kubernetes/nvsentinel/charts/event-exporter/templates/deployment.yaml
  • distros/kubernetes/nvsentinel/charts/fault-quarantine/templates/deployment.yaml
  • distros/kubernetes/nvsentinel/charts/fault-remediation/templates/deployment.yaml
  • distros/kubernetes/nvsentinel/charts/health-events-analyzer/templates/deployment.yaml
  • distros/kubernetes/nvsentinel/charts/node-drainer/templates/deployment.yaml
  • distros/kubernetes/nvsentinel/templates/configmap-datastore.yaml
  • distros/kubernetes/nvsentinel/templates/daemonset.yaml
  • distros/kubernetes/nvsentinel/values.yaml
  • node-drainer/pkg/reconciler/reconciler_integration_test.go
  • store-client/pkg/adapter/legacy_adapter.go
  • store-client/pkg/adapter/legacy_adapter_test.go
  • store-client/pkg/client/mongodb_client.go
  • store-client/pkg/config/env_loader.go
  • store-client/pkg/datastore/providers/mongodb/watcher/watch_store.go
  • store-client/pkg/datastore/providers/mongodb/watcher/watch_store_test.go
  • store-client/pkg/datastore/providers/postgresql/datastore.go
  • store-client/pkg/factory/client_factory.go
  • store-client/pkg/storewatcher/config.go
  • store-client/pkg/storewatcher/watch_store.go
  • store-client/pkg/storewatcher/watch_store_test.go
✅ Files skipped from review due to trivial changes (1)
  • store-client/pkg/adapter/legacy_adapter_test.go
🚧 Files skipped from review as they are similar to previous changes (10)
  • node-drainer/pkg/reconciler/reconciler_integration_test.go
  • store-client/pkg/factory/client_factory.go
  • distros/kubernetes/nvsentinel/templates/daemonset.yaml
  • store-client/pkg/datastore/providers/mongodb/watcher/watch_store_test.go
  • distros/kubernetes/nvsentinel/charts/event-exporter/templates/deployment.yaml
  • distros/kubernetes/nvsentinel/values.yaml
  • distros/kubernetes/nvsentinel/charts/health-events-analyzer/templates/deployment.yaml
  • store-client/pkg/storewatcher/config.go
  • distros/kubernetes/nvsentinel/templates/configmap-datastore.yaml
  • distros/kubernetes/nvsentinel/charts/csp-health-monitor/templates/deployment.yaml
🧰 Additional context used
📓 Path-based instructions (2)
**/*.go

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.go: Follow standard Go conventions with gofmt and golint
Use structured logging via log/slog in Go code
Wrap errors with context using fmt.Errorf("context: %w", err) in Go code
Within retry.RetryOnConflict blocks, return errors without wrapping to preserve retry behavior
Use meaningful variable names such as synced over ok for cache sync checks
Use client-go for Kubernetes API interactions in Go code
Prefer informers over direct API calls for watching Kubernetes resources
Implement proper shutdown handling with context cancellation in Go code
Package-level godoc required for all Go packages
Function comments required for all exported Go functions
Use inline comments for complex logic only in Go code
TODO comments should reference issues in Go code
Extract informer event handler setup into helper methods
Use separate informers for different Kubernetes resource types

Files:

  • store-client/pkg/datastore/providers/postgresql/datastore.go
  • store-client/pkg/client/mongodb_client.go
  • store-client/pkg/storewatcher/watch_store.go
  • store-client/pkg/storewatcher/watch_store_test.go
  • store-client/pkg/datastore/providers/mongodb/watcher/watch_store.go
  • store-client/pkg/adapter/legacy_adapter.go
  • store-client/pkg/config/env_loader.go
**/*_test.go

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*_test.go: Use envtest for testing Kubernetes controllers instead of fake clients
Use testify/assert and testify/require for assertions in Go tests
Write table-driven tests when testing multiple scenarios in Go
Name Go tests descriptively using format: TestFunctionName_Scenario_ExpectedBehavior

Files:

  • store-client/pkg/storewatcher/watch_store_test.go
🧠 Learnings (4)
📚 Learning: 2025-11-24T22:20:48.152Z
Learnt from: CR
Repo: NVIDIA/NVSentinel PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-24T22:20:48.152Z
Learning: Applies to **/*.go : Use structured logging via `log/slog` in Go code

Applied to files:

  • store-client/pkg/datastore/providers/postgresql/datastore.go
  • store-client/pkg/client/mongodb_client.go
📚 Learning: 2025-12-23T05:02:22.108Z
Learnt from: tanishagoyal2
Repo: NVIDIA/NVSentinel PR: 609
File: store-client/pkg/client/postgresql_pipeline_builder.go:119-132
Timestamp: 2025-12-23T05:02:22.108Z
Learning: In the NVSentinel codebase, protobuf fields stored in MongoDB should use lowercase field names (e.g., processingstrategy, componentclass, checkname). Ensure pipeline filters and queries that access protobuf fields in the database consistently use lowercase field names in the store-client package, avoiding camelCase mappings for MongoDB reads/writes.

Applied to files:

  • store-client/pkg/datastore/providers/postgresql/datastore.go
  • store-client/pkg/client/mongodb_client.go
  • store-client/pkg/storewatcher/watch_store.go
  • store-client/pkg/storewatcher/watch_store_test.go
  • store-client/pkg/datastore/providers/mongodb/watcher/watch_store.go
  • store-client/pkg/adapter/legacy_adapter.go
  • store-client/pkg/config/env_loader.go
📚 Learning: 2025-11-15T17:25:33.481Z
Learnt from: dims
Repo: NVIDIA/NVSentinel PR: 255
File: node-drainer/pkg/config/config.go:131-159
Timestamp: 2025-11-15T17:25:33.481Z
Learning: In node-drainer's token configuration, the ClientName must be consistent between TokenConfigFromEnv() and NewTokenConfig() since it's used as a MongoDB query key for resume token lookups in the store-client watcher implementation.

Applied to files:

  • store-client/pkg/storewatcher/watch_store.go
  • store-client/pkg/datastore/providers/mongodb/watcher/watch_store.go
📚 Learning: 2025-12-22T16:16:24.320Z
Learnt from: pteranodan
Repo: NVIDIA/NVSentinel PR: 607
File: client-go/nvgrpc/config_test.go:61-80
Timestamp: 2025-12-22T16:16:24.320Z
Learning: In Go tests across the repository, avoid introducing the testify dependency for simple equality/inequality checks. Use the standard testing package assertions (t.Error, t.Errorf, t.Fatal, etc.) for straightforward checks. Reserve third-party assertion libraries for complex scenarios that require richer diagnostics or expressive matchers.

Applied to files:

  • store-client/pkg/storewatcher/watch_store_test.go
🧬 Code graph analysis (4)
store-client/pkg/datastore/providers/postgresql/datastore.go (2)
store-client/pkg/datastore/types.go (2)
  • ConnectionConfig (42-54)
  • DataStoreConfig (35-39)
node-drainer/pkg/config/config.go (1)
  • Duration (37-39)
store-client/pkg/client/mongodb_client.go (1)
node-drainer/pkg/config/config.go (1)
  • Duration (37-39)
store-client/pkg/datastore/providers/mongodb/watcher/watch_store.go (2)
node-drainer/pkg/config/config.go (1)
  • Duration (37-39)
store-client/pkg/config/env_loader.go (2)
  • DefaultMaxPoolSize (94-94)
  • DefaultMinPoolSize (95-95)
store-client/pkg/config/env_loader.go (1)
store-client/pkg/datastore/providers/mongodb/watcher/watch_store.go (2)
  • DefaultMaxPoolSize (50-50)
  • DefaultMinPoolSize (51-51)
🔇 Additional comments (18)
distros/kubernetes/nvsentinel/charts/node-drainer/templates/deployment.yaml (1)

139-141: LGTM! Consistent with other deployments.

The implementation is identical to the other deployment templates in this PR, ensuring consistency across all NVSentinel services. The MONGODB_APP_NAME will correctly identify this service as "node-drainer" in MongoDB operations and logs.

distros/kubernetes/nvsentinel/charts/fault-quarantine/templates/deployment.yaml (1)

138-140: LGTM! Matches PR example.

The implementation correctly adds the MONGODB_APP_NAME environment variable, and "fault-quarantine" is explicitly mentioned in the PR objectives as an example service name for MongoDB connection identification. The implementation is consistent with the other deployment templates.

distros/kubernetes/nvsentinel/charts/fault-remediation/templates/deployment.yaml (1)

139-141: LGTM! MongoDB app name addition enhances observability.

The addition of the MONGODB_APP_NAME environment variable aligns with the PR objectives to identify services in MongoDB logs and db.currentOp(). The use of {{ .Chart.Name | quote }} is idiomatic and will correctly resolve to "fault-remediation".

The Go application properly consumes the environment variable in the config struct (store-client/pkg/storewatcher/config.go) and passes it to the MongoDB client via SetAppName() in the watch store initialization code (store-client/pkg/storewatcher/watch_store.go and store-client/pkg/datastore/providers/mongodb/watcher/watch_store.go). The implementation includes appropriate conditional checks and logging.

store-client/pkg/datastore/providers/mongodb/watcher/watch_store.go (6)

71-74: LGTM! Pool configuration fields properly added.

The new fields for connection pool management are well-structured and align with MongoDB driver requirements. The use of uint64 for pool sizes matches the MongoDB driver's API, and time.Duration for idle time is idiomatic Go.


128-132: Excellent pattern for preventing connection leaks on error paths.

The disconnectOnError helper ensures that MongoDB connections are properly cleaned up when initialization fails. The helper is correctly placed after mongo.Connect() succeeds and is consistently called on all validation failures.

Also applies to: 135-135, 140-140, 145-145, 156-156, 166-166, 200-200, 209-209


504-517: Proper cleanup of MongoDB client on shutdown.

The addition of client.Disconnect() in the Close() method ensures that connections are properly released. The error handling correctly preserves the original changeStream.Close() error while logging any disconnect failures.


655-686: Well-structured connection pool configuration with sensible defaults.

The pool configuration logic correctly applies defaults when values are unset and configures the MongoDB client appropriately. The conditional AppName setting with logging is particularly useful for identifying connections in MongoDB's currentOp() and logs, which directly addresses the PR's observability goals.


48-53: The constants are not true duplicates. While DefaultMaxPoolSize and DefaultMinPoolSize have identical names and values in both files, the time-related constant differs: watch_store.go defines DefaultMaxConnIdleTime = 5 * time.Minute (a time.Duration), while env_loader.go defines DefaultMaxConnIdleTimeSeconds = 300 (an int representing seconds). These serve different purposes—one works with Go's duration type directly, the other handles seconds for environment variable parsing. Consolidating them would introduce unnecessary cross-package dependencies and type conversions. The duplication concern is overstated given the different types and units involved.

Likely an incorrect or invalid review comment.


583-588: Error-path cleanup is correct, but success-path lifecycle concern is unfounded.

The disconnectOnError helper properly prevents leaks during initialization failures and is called appropriately on all error paths.

However, the concern about success-path lifecycle is incorrect. GetCollectionClient is called once during MongoDBClient initialization, and the underlying mongo.Client is extracted (line 410: client := mongoCol.Database().Client()) and stored in the MongoDBClient struct. The client lifecycle is properly managed via the MongoDBClient.Close() method, which explicitly calls c.client.Disconnect(ctx) at line 826. Callers do have a way to disconnect the client when done—through the Close() method on the returned client object.

store-client/pkg/config/env_loader.go (9)

77-80: LGTM! Environment variable constants follow conventions.

The new constants use consistent MONGODB_ prefixes and clear, descriptive names that align with existing patterns in the codebase.


93-96: Connection pool defaults align with PR objectives.

The defaults (max: 3, min: 1, idle: 5 minutes) match the stated rationale of supporting "1 change stream + 1 query + 1 write concurrently" and the idle timeout helps prevent connection accumulation in large clusters.


129-132: LGTM! Private fields with appropriate types.

The new fields follow good encapsulation practices by being private with public getters. The types are appropriate: uint64 for pool sizes (matching MongoDB driver) and int for timeout seconds.


196-222: Good refactoring with helper functions.

The getRequiredEnv and getCollectionName helpers reduce code duplication and improve readability. The logic correctly handles environment variable precedence and provides clear error messages.


264-280: Proper configuration loading with validation.

The pool settings are loaded correctly with appropriate validation. The AppName is optional (which makes sense for backward compatibility), while pool sizes are validated to be positive integers.


374-390: Consistent pool configuration for PostgreSQL path.

The pool settings are loaded identically to the MongoDB path, which is appropriate since connection pooling concepts are database-agnostic. The comment clarifies that these settings apply to PostgreSQL as well.

Minor note: Consider whether MONGODB_APP_NAME and other MONGODB_* prefixed variables should be renamed to DATASTORE_* for true database-agnostic configuration, but this is a nice-to-have refinement and not critical for this PR.


479-497: Well-implemented uint64 environment variable parser.

The getPositiveUint64EnvVar helper correctly validates that pool sizes are positive (non-zero), which is essential for connection pool configuration. Error messages provide clear context for debugging.


584-610: Getters provide sensible defaults for zero values.

The getter implementations correctly apply defaults when fields are zero, providing defensive behavior for partially-initialized configs. The GetAppName() appropriately returns the empty string without applying a default, since an empty app name is semantically valid (no app name specified).

Optional refinement: The explicit uint64() casts at lines 590 and 598 are unnecessary since DefaultMaxPoolSize and DefaultMinPoolSize are untyped constants that can be directly assigned to uint64, but the explicit cast improves clarity.


34-37: Breaking change to public interface—ensure release notes are updated and communicate to consumers.

Adding four new methods to the DatabaseConfig interface is a breaking change for any external implementations. The single implementation in this codebase (StandardDatabaseConfig) has been properly updated with all new getter methods. However, ensure this change is documented in release notes and communicated to external consumers who may have implemented this interface.

@github-actions
Copy link

github-actions bot commented Jan 9, 2026

Merging this branch will decrease overall coverage

Impacted Packages Coverage Δ 🤖
github.com/nvidia/nvsentinel/store-client/pkg/config 0.00% (ø)
github.com/nvidia/nvsentinel/store-client/pkg/datastore/providers/mongodb/watcher 11.15% (-1.38%) 👎
github.com/nvidia/nvsentinel/store-client/pkg/datastore/providers/postgresql 4.67% (-0.47%) 👎
github.com/nvidia/nvsentinel/store-client/pkg/factory 2.64% (-0.94%) 👎
github.com/nvidia/nvsentinel/store-client/pkg/storewatcher 10.55% (-1.15%) 👎

Coverage by file

Changed files (no unit tests)

Changed File Coverage Δ Total Covered Missed 🤖
github.com/nvidia/nvsentinel/store-client/pkg/config/env_loader.go 0.00% (ø) 2354 (+686) 0 2354 (+686)
github.com/nvidia/nvsentinel/store-client/pkg/datastore/providers/mongodb/watcher/watch_store.go 12.42% (-1.89%) 3252 (+716) 404 (+41) 2848 (+675) 👎
github.com/nvidia/nvsentinel/store-client/pkg/datastore/providers/postgresql/datastore.go 4.69% (-0.72%) 1365 (+255) 64 (+4) 1301 (+251) 👎
github.com/nvidia/nvsentinel/store-client/pkg/factory/client_factory.go 2.64% (-0.94%) 681 (+178) 18 663 (+178) 👎
github.com/nvidia/nvsentinel/store-client/pkg/storewatcher/watch_store.go 12.14% (-1.63%) 2768 (+568) 336 (+33) 2432 (+535) 👎

Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code.

Changed unit test files

  • github.com/nvidia/nvsentinel/store-client/pkg/datastore/providers/mongodb/watcher/watch_store_test.go
  • github.com/nvidia/nvsentinel/store-client/pkg/storewatcher/watch_store_test.go

@github-actions
Copy link

github-actions bot commented Jan 9, 2026

🛡️ CodeQL Analysis

🚨 Found 1 security alert(s)

🔗 View details

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
store-client/pkg/datastore/providers/mongodb/watcher/watch_store.go (1)

504-524: Minor: Error message may be misleading when wrapping disconnect errors.

When changeStream.Close() succeeds but client.Disconnect() fails, the error from line 512 is returned. However, it gets wrapped at line 520 with the message "failed to close change stream", which is misleading since the actual failure was during client disconnect.

Consider using a more accurate message or handling the two error cases separately:

Suggested fix
-	if err != nil {
-		return fmt.Errorf("failed to close change stream for client %s: %w", w.clientName, err)
-	}
+	if err != nil {
+		return err // Error already has appropriate context from lines 512 or 495
+	}

Alternatively, add context at line 512 to distinguish:

-			err = fmt.Errorf("failed to disconnect MongoDB client for %s: %w", w.clientName, disconnectErr)
+			err = disconnectErr
🧹 Nitpick comments (3)
store-client/pkg/factory/client_factory.go (1)

90-98: Consider making ConnMaxLifetime configurable.

The connection pool settings are now configurable via environment variables (MaxOpenConns, MaxIdleConns, ConnMaxIdleTime), but ConnMaxLifetime is hardcoded to 1 hour. For consistency and flexibility, consider adding this as a configurable parameter as well.

♻️ Example: Add configurable ConnMaxLifetime

In store-client/pkg/config/env_loader.go, add:

EnvMongoDBMaxConnLifetimeSeconds = "MONGODB_MAX_CONN_LIFETIME_SECONDS"
DefaultMaxConnLifetimeSeconds = 3600 // 1 hour

Then retrieve and apply it here:

 maxConnIdleTime := time.Duration(f.dbConfig.GetMaxConnIdleTimeSeconds()) * time.Second
+maxConnLifetime := time.Duration(f.dbConfig.GetMaxConnLifetimeSeconds()) * time.Second

 db.SetMaxOpenConns(poolSizeToInt(maxPoolSize))
 db.SetMaxIdleConns(poolSizeToInt(minPoolSize))
 db.SetConnMaxIdleTime(maxConnIdleTime)
-db.SetConnMaxLifetime(time.Hour) // Max lifetime of a connection
+db.SetConnMaxLifetime(maxConnLifetime)
store-client/pkg/datastore/providers/mongodb/watcher/watch_store.go (2)

569-624: Pre-existing design concern: GetCollectionClient leaks client reference.

The function returns a *mongo.Collection but the underlying *mongo.Client is not exposed to callers. This means callers have no way to disconnect the client when they're done, which could lead to connection accumulation over time if GetCollectionClient is called multiple times.

This is a pre-existing issue, but given the PR's focus on connection pool management, consider either:

  1. Returning both the collection and client (or a wrapper struct with a Close() method)
  2. Documenting that callers should reuse the returned collection rather than calling GetCollectionClient repeatedly

655-688: Consider using Debug level for AppName configuration log.

The log at line 685 will be emitted on every MongoDB client creation. For production deployments with frequent client connections, Info level may be too verbose. Consider using Debug level instead.

Suggested change
 	// Set AppName if provided - this helps identify connections in MongoDB logs and currentOp
 	if mongoConfig.AppName != "" {
 		clientOpts.SetAppName(mongoConfig.AppName)
-		slog.Info("MongoDB client configured with app name", "appName", mongoConfig.AppName)
+		slog.Debug("MongoDB client configured with app name", "appName", mongoConfig.AppName)
 	}
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e5f35dc and 86211f9.

📒 Files selected for processing (8)
  • store-client/pkg/adapter/legacy_adapter.go
  • store-client/pkg/config/env_loader.go
  • store-client/pkg/datastore/providers/mongodb/watcher/watch_store.go
  • store-client/pkg/datastore/providers/mongodb/watcher/watch_store_test.go
  • store-client/pkg/datastore/providers/postgresql/datastore.go
  • store-client/pkg/factory/client_factory.go
  • store-client/pkg/storewatcher/watch_store.go
  • store-client/pkg/storewatcher/watch_store_test.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • store-client/pkg/storewatcher/watch_store_test.go
  • store-client/pkg/adapter/legacy_adapter.go
🧰 Additional context used
📓 Path-based instructions (2)
**/*.go

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.go: Follow standard Go conventions with gofmt and golint
Use structured logging via log/slog in Go code
Wrap errors with context using fmt.Errorf("context: %w", err) in Go code
Within retry.RetryOnConflict blocks, return errors without wrapping to preserve retry behavior
Use meaningful variable names such as synced over ok for cache sync checks
Use client-go for Kubernetes API interactions in Go code
Prefer informers over direct API calls for watching Kubernetes resources
Implement proper shutdown handling with context cancellation in Go code
Package-level godoc required for all Go packages
Function comments required for all exported Go functions
Use inline comments for complex logic only in Go code
TODO comments should reference issues in Go code
Extract informer event handler setup into helper methods
Use separate informers for different Kubernetes resource types

Files:

  • store-client/pkg/factory/client_factory.go
  • store-client/pkg/storewatcher/watch_store.go
  • store-client/pkg/datastore/providers/mongodb/watcher/watch_store_test.go
  • store-client/pkg/datastore/providers/postgresql/datastore.go
  • store-client/pkg/datastore/providers/mongodb/watcher/watch_store.go
  • store-client/pkg/config/env_loader.go
**/*_test.go

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*_test.go: Use envtest for testing Kubernetes controllers instead of fake clients
Use testify/assert and testify/require for assertions in Go tests
Write table-driven tests when testing multiple scenarios in Go
Name Go tests descriptively using format: TestFunctionName_Scenario_ExpectedBehavior

Files:

  • store-client/pkg/datastore/providers/mongodb/watcher/watch_store_test.go
🧠 Learnings (5)
📚 Learning: 2025-12-23T05:02:22.108Z
Learnt from: tanishagoyal2
Repo: NVIDIA/NVSentinel PR: 609
File: store-client/pkg/client/postgresql_pipeline_builder.go:119-132
Timestamp: 2025-12-23T05:02:22.108Z
Learning: In the NVSentinel codebase, protobuf fields stored in MongoDB should use lowercase field names (e.g., processingstrategy, componentclass, checkname). Ensure pipeline filters and queries that access protobuf fields in the database consistently use lowercase field names in the store-client package, avoiding camelCase mappings for MongoDB reads/writes.

Applied to files:

  • store-client/pkg/factory/client_factory.go
  • store-client/pkg/storewatcher/watch_store.go
  • store-client/pkg/datastore/providers/mongodb/watcher/watch_store_test.go
  • store-client/pkg/datastore/providers/postgresql/datastore.go
  • store-client/pkg/datastore/providers/mongodb/watcher/watch_store.go
  • store-client/pkg/config/env_loader.go
📚 Learning: 2025-11-24T22:20:48.152Z
Learnt from: CR
Repo: NVIDIA/NVSentinel PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-24T22:20:48.152Z
Learning: Applies to **/*.go : Implement proper shutdown handling with context cancellation in Go code

Applied to files:

  • store-client/pkg/storewatcher/watch_store.go
📚 Learning: 2025-11-15T17:25:33.481Z
Learnt from: dims
Repo: NVIDIA/NVSentinel PR: 255
File: node-drainer/pkg/config/config.go:131-159
Timestamp: 2025-11-15T17:25:33.481Z
Learning: In node-drainer's token configuration, the ClientName must be consistent between TokenConfigFromEnv() and NewTokenConfig() since it's used as a MongoDB query key for resume token lookups in the store-client watcher implementation.

Applied to files:

  • store-client/pkg/storewatcher/watch_store.go
  • store-client/pkg/datastore/providers/mongodb/watcher/watch_store.go
📚 Learning: 2025-12-22T16:16:24.320Z
Learnt from: pteranodan
Repo: NVIDIA/NVSentinel PR: 607
File: client-go/nvgrpc/config_test.go:61-80
Timestamp: 2025-12-22T16:16:24.320Z
Learning: In Go tests across the repository, avoid introducing the testify dependency for simple equality/inequality checks. Use the standard testing package assertions (t.Error, t.Errorf, t.Fatal, etc.) for straightforward checks. Reserve third-party assertion libraries for complex scenarios that require richer diagnostics or expressive matchers.

Applied to files:

  • store-client/pkg/datastore/providers/mongodb/watcher/watch_store_test.go
📚 Learning: 2025-11-24T22:20:48.152Z
Learnt from: CR
Repo: NVIDIA/NVSentinel PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-24T22:20:48.152Z
Learning: Applies to **/*.go : Use structured logging via `log/slog` in Go code

Applied to files:

  • store-client/pkg/datastore/providers/postgresql/datastore.go
🧬 Code graph analysis (4)
store-client/pkg/factory/client_factory.go (1)
node-drainer/pkg/config/config.go (1)
  • Duration (37-39)
store-client/pkg/datastore/providers/mongodb/watcher/watch_store_test.go (2)
store-client/pkg/datastore/providers/mongodb/watcher/watch_store.go (2)
  • ChangeStreamWatcher (89-108)
  • Event (40-40)
store-client/pkg/storewatcher/watch_store.go (1)
  • ChangeStreamWatcher (43-57)
store-client/pkg/datastore/providers/postgresql/datastore.go (1)
store-client/pkg/datastore/types.go (2)
  • ConnectionConfig (42-54)
  • DataStoreConfig (35-39)
store-client/pkg/config/env_loader.go (1)
store-client/pkg/datastore/providers/mongodb/watcher/watch_store.go (2)
  • DefaultMaxPoolSize (50-50)
  • DefaultMinPoolSize (51-51)
🔇 Additional comments (20)
store-client/pkg/factory/client_factory.go (1)

227-235: LGTM: Safe pool size conversion with bounds checking.

The helper appropriately caps pool sizes at 1000 to prevent resource exhaustion from misconfiguration and safely converts uint64 to int.

store-client/pkg/config/env_loader.go (4)

34-37: LGTM: Well-structured pool configuration interface and constants.

The new interface methods, environment variable constants, and defaults align with the PR objectives (MaxPoolSize: 3, MinPoolSize: 1, MaxConnIdleTime: 300s). The naming is clear and consistent with existing patterns.

Also applies to: 77-80, 94-96


196-222: LGTM: Clean helper functions with good error handling.

The extraction of getRequiredEnv and getCollectionName reduces code duplication and improves maintainability. Error messages are clear and actionable.


479-497: LGTM: Proper validation for pool size values.

The function correctly validates that pool sizes are positive (non-zero) uint64 values and provides appropriate error messages. This prevents configuration issues with invalid pool sizes.


588-610: LGTM: Getters with safe default fallback logic.

The getter implementations properly return configured defaults when values are zero, ensuring pool settings are always valid. This defensive approach prevents runtime issues from uninitialized configuration.

store-client/pkg/datastore/providers/mongodb/watcher/watch_store_test.go (2)

192-197: LGTM: Proper test cleanup to avoid double-disconnect.

Setting watcher.client = nil before calling Close() is the correct approach when using mtest, which manages the MongoDB client lifecycle. The inline comments clearly explain the rationale, preventing future confusion.

Also applies to: 295-298, 360-363, 432-435, 1619-1622, 1769-1772, 1851-1854


1860-1888: LGTM: Comprehensive edge case coverage.

The new test explicitly verifies that Close() handles a nil client gracefully, which is an important edge case given the mtest lifecycle management. This improves test robustness and documents the expected behavior.

store-client/pkg/storewatcher/watch_store.go (4)

76-81: Excellent: Prevents connection leaks during initialization errors.

The disconnectOnError helper is consistently applied across all error paths in NewChangeStreamWatcher, ensuring the MongoDB client is properly disconnected when initialization fails. This directly addresses the connection accumulation issue described in the PR objectives.

Also applies to: 84-96, 105-106, 115-116, 149-152, 158-159


383-396: Excellent: Proper client disconnection during shutdown.

Adding client.Disconnect() in the Close() method releases MongoDB connections during normal shutdown, complementing the error-path cleanup. The nil-check prevents panics, and the error handling preserves the original error when both operations fail.


552-570: LGTM: Pool configuration applied with proper logging.

The connection pool settings (MaxPoolSize, MinPoolSize, MaxConnIdleTime) are correctly applied from the config, with AppName support for easier identification in MongoDB logs and db.currentOp(). This aligns perfectly with the PR objectives.


462-467: LGTM: Consistent cleanup pattern in GetCollectionClient.

The same disconnectOnError pattern is applied in GetCollectionClient, maintaining consistency with NewChangeStreamWatcher and ensuring connection cleanup on all error paths.

Also applies to: 470-481, 491-492

store-client/pkg/datastore/providers/postgresql/datastore.go (3)

40-59: LGTM: Centralized validation improves code quality.

Extracting configuration validation into a dedicated function reduces duplication and makes the validation logic easier to maintain and test. The port range validation (1-65535) is correctly implemented.


61-92: LGTM: Robust pool option parsing with defensive defaults.

The getPoolOption helper properly handles invalid values by logging warnings and falling back to defaults. The positive-value check prevents configuration errors from causing runtime issues. This approach aligns with Go best practices for configuration management.


114-122: LGTM: PostgreSQL pool settings match MongoDB defaults.

The pool configuration (defaults: maxOpenConns=3, maxIdleConns=1, maxConnIdleTime=300s) mirrors the MongoDB settings established elsewhere in this PR, providing consistency across database providers. The 1-hour ConnMaxLifetime is reasonable for long-running connections.

store-client/pkg/datastore/providers/mongodb/watcher/watch_store.go (6)

48-53: LGTM!

The connection pool defaults are well-chosen for the stated use case (1 change stream + 1 query + 1 write concurrently). Exporting these constants allows consumers to reference them if needed.


70-74: LGTM!

The new configuration fields use correct types matching the MongoDB driver's expectations (uint64 for pool sizes, time.Duration for idle time). Zero values appropriately trigger defaults in constructMongoClientOptions.


127-132: LGTM!

The disconnectOnError helper correctly prevents connection leaks during initialization failures. Using the same context for disconnect is a reasonable trade-off - if the context is cancelled, the disconnect attempt is still made with best effort.


134-147: LGTM!

All configuration validation error paths correctly invoke disconnectOnError() before returning, ensuring no connection leaks when configuration is invalid.


155-167: LGTM!

Connectivity check failures correctly trigger disconnectOnError() to release the established connection before returning the error.


198-211: LGTM!

Both the resume token retrieval failure (non-ErrNoDocuments cases) and change stream open failure paths correctly call disconnectOnError() to prevent connection leaks.

@github-actions
Copy link

github-actions bot commented Jan 9, 2026

Merging this branch changes the coverage (4 decrease, 1 increase)

Impacted Packages Coverage Δ 🤖
github.com/nvidia/nvsentinel/store-client/pkg/adapter 1.69% (+1.69%) 👍
github.com/nvidia/nvsentinel/store-client/pkg/config 0.00% (ø)
github.com/nvidia/nvsentinel/store-client/pkg/datastore/providers/mongodb/watcher 11.15% (-1.38%) 👎
github.com/nvidia/nvsentinel/store-client/pkg/datastore/providers/postgresql 4.66% (-0.49%) 👎
github.com/nvidia/nvsentinel/store-client/pkg/factory 2.64% (-0.94%) 👎
github.com/nvidia/nvsentinel/store-client/pkg/storewatcher 10.45% (-1.24%) 👎

Coverage by file

Changed files (no unit tests)

Changed File Coverage Δ Total Covered Missed 🤖
github.com/nvidia/nvsentinel/store-client/pkg/adapter/legacy_adapter.go 1.69% (+1.69%) 830 (+44) 14 (+14) 816 (+30) 👍
github.com/nvidia/nvsentinel/store-client/pkg/config/env_loader.go 0.00% (ø) 2354 (+686) 0 2354 (+686)
github.com/nvidia/nvsentinel/store-client/pkg/datastore/providers/mongodb/watcher/watch_store.go 12.42% (-1.89%) 3252 (+716) 404 (+41) 2848 (+675) 👎
github.com/nvidia/nvsentinel/store-client/pkg/datastore/providers/postgresql/datastore.go 4.46% (-0.94%) 1434 (+324) 64 (+4) 1370 (+320) 👎
github.com/nvidia/nvsentinel/store-client/pkg/factory/client_factory.go 2.64% (-0.94%) 681 (+178) 18 663 (+178) 👎
github.com/nvidia/nvsentinel/store-client/pkg/storewatcher/watch_store.go 11.99% (-1.78%) 2768 (+568) 332 (+29) 2436 (+539) 👎

Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code.

Changed unit test files

  • github.com/nvidia/nvsentinel/store-client/pkg/datastore/providers/mongodb/watcher/watch_store_test.go
  • github.com/nvidia/nvsentinel/store-client/pkg/storewatcher/watch_store_test.go

@github-actions
Copy link

github-actions bot commented Jan 9, 2026

🛡️ CodeQL Analysis

🚨 Found 1 security alert(s)

🔗 View details

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant