Support expected remote cluster name in CCS sniff mode#20532
Support expected remote cluster name in CCS sniff mode#20532varunbharadwaj merged 3 commits intoopensearch-project:mainfrom
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the
📝 WalkthroughWalkthroughThe changes introduce support for validating expected cluster names in CCS Sniff mode through a new REMOTE_CLUSTER_EXPECTED_NAME setting, enforced during cluster handshakes to prevent connections to mismatched clusters. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Suggested labels
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
54d5ed5 to
dc7a35c
Compare
|
❌ Gradle check result for dc7a35c: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
dc7a35c to
33c2ae3
Compare
Signed-off-by: Varun Bharadwaj <varunbharadwaj1995@gmail.com>
33c2ae3 to
2005eca
Compare
|
❌ Gradle check result for 2005eca: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
❌ Gradle check result for 2005eca: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #20532 +/- ##
============================================
- Coverage 73.33% 73.25% -0.08%
+ Complexity 72125 72106 -19
============================================
Files 5798 5798
Lines 329654 329722 +68
Branches 47491 47518 +27
============================================
- Hits 241741 241539 -202
- Misses 68504 68846 +342
+ Partials 19409 19337 -72 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In
`@server/src/test/java/org/opensearch/transport/SniffConnectionStrategyTests.java`:
- Around line 1145-1183: The test is calling the protected method
strategyMustBeRebuilt on SniffConnectionStrategy; change both calls to the
public API strategy.shouldRebuildConnection(...) so the test uses the
public-facing rebuild check (replace
strategy.strategyMustBeRebuilt(noChangeSetting) and
strategy.strategyMustBeRebuilt(changedSetting) with
strategy.shouldRebuildConnection(noChangeSetting) and
strategy.shouldRebuildConnection(changedSetting) respectively) to match other
tests and access level.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
CHANGELOG.mdserver/src/main/java/org/opensearch/common/settings/ClusterSettings.javaserver/src/main/java/org/opensearch/transport/RemoteClusterAware.javaserver/src/main/java/org/opensearch/transport/SniffConnectionStrategy.javaserver/src/test/java/org/opensearch/transport/SniffConnectionStrategyTests.java
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-12-12T18:40:08.452Z
Learnt from: reta
Repo: opensearch-project/OpenSearch PR: 20017
File: plugins/transport-reactor-netty4/src/test/java/org/opensearch/http/reactor/netty4/ssl/SecureReactorNetty4HttpServerTransportTests.java:256-256
Timestamp: 2025-12-12T18:40:08.452Z
Learning: In the OpenSearch ReactorNetty4 secure HTTP transport tests (plugins/transport-reactor-netty4/src/test/java/org/opensearch/http/reactor/netty4/ssl/SecureReactorNetty4HttpServerTransportTests.java), URI limit validation has been moved from the protocol layer to the transport layer, making it protocol-agnostic. The random protocol selection in ReactorHttpClient.https(settings) is intentional to ensure all tests validate correct behavior across HTTP/1.1, HTTP/2, and HTTP/3.
Applied to files:
server/src/test/java/org/opensearch/transport/SniffConnectionStrategyTests.java
📚 Learning: 2026-01-13T17:40:27.167Z
Learnt from: reta
Repo: opensearch-project/OpenSearch PR: 20411
File: server/src/main/java/org/opensearch/index/codec/CodecService.java:112-133
Timestamp: 2026-01-13T17:40:27.167Z
Learning: Avoid capturing or evaluating a supplier (e.g., this::defaultCodec) upfront when passing it to a registry during object construction. If registries may replace defaults during iteration (as in EnginePlugin.getAdditionalCodecs), pass the supplier itself and only resolve it at use time. This ensures dynamic behavior is preserved during initialization and prevents premature binding of defaults in codecs/registry setup. This pattern should apply to similar initialization paths in Java server code where registries may mutate defaults during construction.
Applied to files:
server/src/main/java/org/opensearch/common/settings/ClusterSettings.javaserver/src/main/java/org/opensearch/transport/SniffConnectionStrategy.javaserver/src/main/java/org/opensearch/transport/RemoteClusterAware.java
🧬 Code graph analysis (2)
server/src/test/java/org/opensearch/transport/SniffConnectionStrategyTests.java (1)
server/src/main/java/org/opensearch/transport/SniffConnectionStrategy.java (1)
SniffConnectionStrategy(83-711)
server/src/main/java/org/opensearch/transport/RemoteClusterAware.java (1)
server/src/main/java/org/opensearch/transport/SniffConnectionStrategy.java (1)
SniffConnectionStrategy(83-711)
🔇 Additional comments (18)
CHANGELOG.md (1)
8-8: LGTM!The changelog entry is correctly placed under the "Added" section and follows the existing format with a clear description and PR link.
server/src/main/java/org/opensearch/transport/RemoteClusterAware.java (1)
153-153: LGTM!The new
REMOTE_CLUSTER_EXPECTED_NAMEsetting is correctly added to the remote cluster settings list, ensuring that changes to this setting will trigger the appropriate update callbacks for remote cluster reconfiguration.server/src/main/java/org/opensearch/common/settings/ClusterSettings.java (1)
467-467: LGTM!The new
REMOTE_CLUSTER_EXPECTED_NAMEsetting is correctly registered inBUILT_IN_CLUSTER_SETTINGS, enabling the cluster to recognize and validate this configuration option.server/src/test/java/org/opensearch/transport/SniffConnectionStrategyTests.java (4)
73-73: LGTM!The import for
containsStringis correctly added to support the new assertions in the test.
170-171: LGTM!The constructor calls are correctly updated to include the new
expectedClusterNameparameter (passed asnullto preserve existing behavior).
1029-1088: LGTM!Test 1 correctly validates that the connection fails when only a wrong cluster seed (with a mismatched cluster name) is available. The test properly checks for the expected error message and verifies the node is not connected.
1090-1143: LGTM!Tests 2 and 3 comprehensively cover the fallback behavior when multiple seeds are provided (wrong seed first, correct seed second) and the successful connection scenario when the expected cluster name matches.
server/src/main/java/org/opensearch/transport/SniffConnectionStrategy.java (11)
142-157: LGTM!The new
REMOTE_CLUSTER_EXPECTED_NAMEsetting is well-defined with:
- Clear documentation explaining the purpose and behavior
- Appropriate use of
StrategyValidatorto restrict to SNIFF mode- Correct
DynamicandNodeScopeproperties for runtime configurability
170-170: LGTM!The new field properly stores the expected cluster name for use during handshake validation.
186-188: LGTM!The constructor correctly retrieves the expected cluster name setting from configuration and passes it through the constructor chain.
234-236: LGTM!Good defensive coding: empty or whitespace-only strings are normalized to
nullusingStrings.hasText(), ensuring consistent handling of "not set" vs "empty string" cases.
257-261: LGTM!The rebuild check correctly includes the expected cluster name comparison, ensuring that changes to this setting trigger strategy reconstruction.
502-524: LGTM!The predicate logic is correct:
- First checks against the explicitly configured
expectedClusterName(if set)- Falls back to checking against the first observed cluster name (existing behavior)
- The
toString()method provides clear debugging output for both scenarios
599-603: LGTM!The helper method properly normalizes both old and new values before comparison, ensuring consistent behavior regardless of whether empty strings or nulls are used.
617-626: LGTM!The
SniffModeInfoclass is properly extended:
- New constructor overload maintains backward compatibility
- Old constructor delegates to new one with
nullforexpectedClusterName- Field is correctly initialized
648-650: LGTM!The
toXContentmethod conditionally includesexpected_cluster_nameonly when it's set, keeping the API response clean when the feature is not used.
686-688: LGTM!The getter and
equals/hashCodemethods are correctly updated to include the new field, ensuring proper object comparison and hashing behavior.Also applies to: 696-708
632-636: Version constantV_3_5_0is correct. The serialization usesVersion.V_3_5_0appropriately—it is the current release version in the codebase (confirmed inbuildSrc/version.propertiesandVersion.java). Both the deserialization (lines 632-636) and serialization (lines 659-661) operations use consistent version checks, which is the proper pattern for new serialized fields in backward compatibility scenarios.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
server/src/test/java/org/opensearch/transport/SniffConnectionStrategyTests.java
Show resolved
Hide resolved
Signed-off-by: Varun Bharadwaj <varunbharadwaj1995@gmail.com>
Signed-off-by: Varun Bharadwaj <varunbharadwaj1995@gmail.com>
msfroh
left a comment
There was a problem hiding this comment.
I really like this idea. Thanks, @varunbharadwaj!
|
❌ Gradle check result for 4448d93: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
839254d
into
opensearch-project:main
…oject#20532) * support expected remote cluster name in CCS sniff mode Signed-off-by: Varun Bharadwaj <varunbharadwaj1995@gmail.com> * rename expected_cluster_name entry to just cluster_name Signed-off-by: Varun Bharadwaj <varunbharadwaj1995@gmail.com> * skip checking remote cluster name when expected cluster name is provided Signed-off-by: Varun Bharadwaj <varunbharadwaj1995@gmail.com> --------- Signed-off-by: Varun Bharadwaj <varunbharadwaj1995@gmail.com>
…oject#20532) * support expected remote cluster name in CCS sniff mode Signed-off-by: Varun Bharadwaj <varunbharadwaj1995@gmail.com> * rename expected_cluster_name entry to just cluster_name Signed-off-by: Varun Bharadwaj <varunbharadwaj1995@gmail.com> * skip checking remote cluster name when expected cluster name is provided Signed-off-by: Varun Bharadwaj <varunbharadwaj1995@gmail.com> --------- Signed-off-by: Varun Bharadwaj <varunbharadwaj1995@gmail.com>
…oject#20532) * support expected remote cluster name in CCS sniff mode Signed-off-by: Varun Bharadwaj <varunbharadwaj1995@gmail.com> * rename expected_cluster_name entry to just cluster_name Signed-off-by: Varun Bharadwaj <varunbharadwaj1995@gmail.com> * skip checking remote cluster name when expected cluster name is provided Signed-off-by: Varun Bharadwaj <varunbharadwaj1995@gmail.com> --------- Signed-off-by: Varun Bharadwaj <varunbharadwaj1995@gmail.com>
Description
CCS sniff mode today allows users to configure a set of seeds used to discover remote cluster nodes. Currently, the remote cluster name is saved after the initial handshake, and validated for subsequent node connections. This ensures new connections are made to the same remote cluster.
However, in an environment where hosts are recycled and share multiple use cases, it is possible that the configured seeds become stale and have a different OpenSearch cluster node running on it, on the same port. This is a corner case scenario and unlikely, but not impossible.
In the above scenario, combined with a coordinator node restart, it is possible for a subset of coordinator nodes to forward search traffic to this random cluster. There exists no check today to ensure all coordinator nodes connect to the expected remote cluster.
To solve for this, this PR introduces a "cluster_name" field indicating the expected cluster name. If the user configures this field, the remote cluster name is expected to match the provided value, during handshake. If validation fails, sniff mode continues to evaluate the next seed. We use cluster name and not cluster uuid, as it is already used and readily available. Cluster uuid check can be done after the cluster state is retrieved, and can be considered in the future if needed.
Testing:
_remote/infoAPI responseError observed:
java.lang.IllegalStateException: handshake with [<node info>] failed: remote cluster name [<random-cluster-name>] does not match expected remote cluster name [expected-cluster-name]"_remote/infoAPI responseBy submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.