Skip to content

Support expected remote cluster name in CCS sniff mode#20532

Merged
varunbharadwaj merged 3 commits intoopensearch-project:mainfrom
varunbharadwaj:vb/clusternamecheck
Feb 4, 2026
Merged

Support expected remote cluster name in CCS sniff mode#20532
varunbharadwaj merged 3 commits intoopensearch-project:mainfrom
varunbharadwaj:vb/clusternamecheck

Conversation

@varunbharadwaj
Copy link
Copy Markdown
Contributor

@varunbharadwaj varunbharadwaj commented Feb 3, 2026

Description

CCS sniff mode today allows users to configure a set of seeds used to discover remote cluster nodes. Currently, the remote cluster name is saved after the initial handshake, and validated for subsequent node connections. This ensures new connections are made to the same remote cluster.

However, in an environment where hosts are recycled and share multiple use cases, it is possible that the configured seeds become stale and have a different OpenSearch cluster node running on it, on the same port. This is a corner case scenario and unlikely, but not impossible.

In the above scenario, combined with a coordinator node restart, it is possible for a subset of coordinator nodes to forward search traffic to this random cluster. There exists no check today to ensure all coordinator nodes connect to the expected remote cluster.

To solve for this, this PR introduces a "cluster_name" field indicating the expected cluster name. If the user configures this field, the remote cluster name is expected to match the provided value, during handshake. If validation fails, sniff mode continues to evaluate the next seed. We use cluster name and not cluster uuid, as it is already used and readily available. Cluster uuid check can be done after the cluster state is retrieved, and can be considered in the future if needed.

Testing:

  1. Verified existing behavior - combination of good and stale seeds resulted in forwarding traffic to stale seed / invalid cluster.
  2. Test cluster name validation
  3. Validate cluster name in _remote/info API response

Error observed:
java.lang.IllegalStateException: handshake with [<node info>] failed: remote cluster name [<random-cluster-name>] does not match expected remote cluster name [expected-cluster-name]"

_remote/info API response

"cluster_alias": {
    "connected": false,
    "mode": "sniff",
    "seeds": [
      "10.19.8.59:25668"
    ],
    "num_nodes_connected": 0,
    "max_connections_per_cluster": 30,
    "cluster_name": "expected-cluster-name",
    "initial_connect_timeout": "30s",
    "skip_unavailable": false
},

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Feb 3, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

  • 🔍 Trigger a full review
📝 Walkthrough

Walkthrough

The changes introduce support for validating expected cluster names in CCS Sniff mode through a new REMOTE_CLUSTER_EXPECTED_NAME setting, enforced during cluster handshakes to prevent connections to mismatched clusters.

Changes

Cohort / File(s) Summary
Configuration & Settings
CHANGELOG.md, server/src/main/java/org/opensearch/common/settings/ClusterSettings.java, server/src/main/java/org/opensearch/transport/RemoteClusterAware.java
Added REMOTE_CLUSTER_EXPECTED_NAME setting to built-in cluster settings registry and remote cluster update listeners to enable cluster-wide validation configuration.
Core Implementation
server/src/main/java/org/opensearch/transport/SniffConnectionStrategy.java
Introduced REMOTE_CLUSTER_EXPECTED_NAME setting and expectedClusterName field; updated constructors, mode info serialization (with versioning), and remote cluster name predicate logic to enforce expected cluster name validation during sniff handshakes.
Test Coverage
server/src/test/java/org/opensearch/transport/SniffConnectionStrategyTests.java
Added comprehensive testExpectedClusterNameValidation() test covering seed/node scenarios with mismatched cluster names, fallback behavior, and dynamic strategy rebuilding; updated existing test invocations to pass expectedClusterName parameter.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Suggested labels

Search:Remote Search

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 2.56% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The PR title accurately reflects the main feature: supporting expected remote cluster name validation in CCS sniff mode, which is the core change across all modified files.
Description check ✅ Passed PR description provides comprehensive detail of the problem, solution, and testing performed, matching the template structure.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@varunbharadwaj varunbharadwaj changed the title Support expected remote cluster name in CCS sniff mode [WIP] Support expected remote cluster name in CCS sniff mode Feb 3, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Feb 3, 2026

❌ Gradle check result for dc7a35c: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@varunbharadwaj varunbharadwaj changed the title [WIP] Support expected remote cluster name in CCS sniff mode Support expected remote cluster name in CCS sniff mode Feb 3, 2026
Signed-off-by: Varun Bharadwaj <varunbharadwaj1995@gmail.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Feb 3, 2026

❌ Gradle check result for 2005eca: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Feb 3, 2026

❌ Gradle check result for 2005eca: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Feb 3, 2026

✅ Gradle check result for 2005eca: SUCCESS

@codecov
Copy link
Copy Markdown

codecov bot commented Feb 3, 2026

Codecov Report

❌ Patch coverage is 75.00000% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.25%. Comparing base (93ae8db) to head (4448d93).
⚠️ Report is 5 commits behind head on main.

Files with missing lines Patch % Lines
.../opensearch/transport/SniffConnectionStrategy.java 75.00% 3 Missing and 5 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #20532      +/-   ##
============================================
- Coverage     73.33%   73.25%   -0.08%     
+ Complexity    72125    72106      -19     
============================================
  Files          5798     5798              
  Lines        329654   329722      +68     
  Branches      47491    47518      +27     
============================================
- Hits         241741   241539     -202     
- Misses        68504    68846     +342     
+ Partials      19409    19337      -72     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@varunbharadwaj varunbharadwaj marked this pull request as ready for review February 3, 2026 20:04
@varunbharadwaj varunbharadwaj requested review from a team and peternied as code owners February 3, 2026 20:04
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In
`@server/src/test/java/org/opensearch/transport/SniffConnectionStrategyTests.java`:
- Around line 1145-1183: The test is calling the protected method
strategyMustBeRebuilt on SniffConnectionStrategy; change both calls to the
public API strategy.shouldRebuildConnection(...) so the test uses the
public-facing rebuild check (replace
strategy.strategyMustBeRebuilt(noChangeSetting) and
strategy.strategyMustBeRebuilt(changedSetting) with
strategy.shouldRebuildConnection(noChangeSetting) and
strategy.shouldRebuildConnection(changedSetting) respectively) to match other
tests and access level.
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2c27f8a and 2005eca.

📒 Files selected for processing (5)
  • CHANGELOG.md
  • server/src/main/java/org/opensearch/common/settings/ClusterSettings.java
  • server/src/main/java/org/opensearch/transport/RemoteClusterAware.java
  • server/src/main/java/org/opensearch/transport/SniffConnectionStrategy.java
  • server/src/test/java/org/opensearch/transport/SniffConnectionStrategyTests.java
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-12-12T18:40:08.452Z
Learnt from: reta
Repo: opensearch-project/OpenSearch PR: 20017
File: plugins/transport-reactor-netty4/src/test/java/org/opensearch/http/reactor/netty4/ssl/SecureReactorNetty4HttpServerTransportTests.java:256-256
Timestamp: 2025-12-12T18:40:08.452Z
Learning: In the OpenSearch ReactorNetty4 secure HTTP transport tests (plugins/transport-reactor-netty4/src/test/java/org/opensearch/http/reactor/netty4/ssl/SecureReactorNetty4HttpServerTransportTests.java), URI limit validation has been moved from the protocol layer to the transport layer, making it protocol-agnostic. The random protocol selection in ReactorHttpClient.https(settings) is intentional to ensure all tests validate correct behavior across HTTP/1.1, HTTP/2, and HTTP/3.

Applied to files:

  • server/src/test/java/org/opensearch/transport/SniffConnectionStrategyTests.java
📚 Learning: 2026-01-13T17:40:27.167Z
Learnt from: reta
Repo: opensearch-project/OpenSearch PR: 20411
File: server/src/main/java/org/opensearch/index/codec/CodecService.java:112-133
Timestamp: 2026-01-13T17:40:27.167Z
Learning: Avoid capturing or evaluating a supplier (e.g., this::defaultCodec) upfront when passing it to a registry during object construction. If registries may replace defaults during iteration (as in EnginePlugin.getAdditionalCodecs), pass the supplier itself and only resolve it at use time. This ensures dynamic behavior is preserved during initialization and prevents premature binding of defaults in codecs/registry setup. This pattern should apply to similar initialization paths in Java server code where registries may mutate defaults during construction.

Applied to files:

  • server/src/main/java/org/opensearch/common/settings/ClusterSettings.java
  • server/src/main/java/org/opensearch/transport/SniffConnectionStrategy.java
  • server/src/main/java/org/opensearch/transport/RemoteClusterAware.java
🧬 Code graph analysis (2)
server/src/test/java/org/opensearch/transport/SniffConnectionStrategyTests.java (1)
server/src/main/java/org/opensearch/transport/SniffConnectionStrategy.java (1)
  • SniffConnectionStrategy (83-711)
server/src/main/java/org/opensearch/transport/RemoteClusterAware.java (1)
server/src/main/java/org/opensearch/transport/SniffConnectionStrategy.java (1)
  • SniffConnectionStrategy (83-711)
🔇 Additional comments (18)
CHANGELOG.md (1)

8-8: LGTM!

The changelog entry is correctly placed under the "Added" section and follows the existing format with a clear description and PR link.

server/src/main/java/org/opensearch/transport/RemoteClusterAware.java (1)

153-153: LGTM!

The new REMOTE_CLUSTER_EXPECTED_NAME setting is correctly added to the remote cluster settings list, ensuring that changes to this setting will trigger the appropriate update callbacks for remote cluster reconfiguration.

server/src/main/java/org/opensearch/common/settings/ClusterSettings.java (1)

467-467: LGTM!

The new REMOTE_CLUSTER_EXPECTED_NAME setting is correctly registered in BUILT_IN_CLUSTER_SETTINGS, enabling the cluster to recognize and validate this configuration option.

server/src/test/java/org/opensearch/transport/SniffConnectionStrategyTests.java (4)

73-73: LGTM!

The import for containsString is correctly added to support the new assertions in the test.


170-171: LGTM!

The constructor calls are correctly updated to include the new expectedClusterName parameter (passed as null to preserve existing behavior).


1029-1088: LGTM!

Test 1 correctly validates that the connection fails when only a wrong cluster seed (with a mismatched cluster name) is available. The test properly checks for the expected error message and verifies the node is not connected.


1090-1143: LGTM!

Tests 2 and 3 comprehensively cover the fallback behavior when multiple seeds are provided (wrong seed first, correct seed second) and the successful connection scenario when the expected cluster name matches.

server/src/main/java/org/opensearch/transport/SniffConnectionStrategy.java (11)

142-157: LGTM!

The new REMOTE_CLUSTER_EXPECTED_NAME setting is well-defined with:

  • Clear documentation explaining the purpose and behavior
  • Appropriate use of StrategyValidator to restrict to SNIFF mode
  • Correct Dynamic and NodeScope properties for runtime configurability

170-170: LGTM!

The new field properly stores the expected cluster name for use during handshake validation.


186-188: LGTM!

The constructor correctly retrieves the expected cluster name setting from configuration and passes it through the constructor chain.


234-236: LGTM!

Good defensive coding: empty or whitespace-only strings are normalized to null using Strings.hasText(), ensuring consistent handling of "not set" vs "empty string" cases.


257-261: LGTM!

The rebuild check correctly includes the expected cluster name comparison, ensuring that changes to this setting trigger strategy reconstruction.


502-524: LGTM!

The predicate logic is correct:

  1. First checks against the explicitly configured expectedClusterName (if set)
  2. Falls back to checking against the first observed cluster name (existing behavior)
  3. The toString() method provides clear debugging output for both scenarios

599-603: LGTM!

The helper method properly normalizes both old and new values before comparison, ensuring consistent behavior regardless of whether empty strings or nulls are used.


617-626: LGTM!

The SniffModeInfo class is properly extended:

  • New constructor overload maintains backward compatibility
  • Old constructor delegates to new one with null for expectedClusterName
  • Field is correctly initialized

648-650: LGTM!

The toXContent method conditionally includes expected_cluster_name only when it's set, keeping the API response clean when the feature is not used.


686-688: LGTM!

The getter and equals/hashCode methods are correctly updated to include the new field, ensuring proper object comparison and hashing behavior.

Also applies to: 696-708


632-636: Version constant V_3_5_0 is correct. The serialization uses Version.V_3_5_0 appropriately—it is the current release version in the codebase (confirmed in buildSrc/version.properties and Version.java). Both the deserialization (lines 632-636) and serialization (lines 659-661) operations use consistent version checks, which is the proper pattern for new serialized fields in backward compatibility scenarios.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Signed-off-by: Varun Bharadwaj <varunbharadwaj1995@gmail.com>
Signed-off-by: Varun Bharadwaj <varunbharadwaj1995@gmail.com>
Copy link
Copy Markdown
Contributor

@msfroh msfroh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this idea. Thanks, @varunbharadwaj!

Copy link
Copy Markdown

@G1GC G1GC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Feb 4, 2026

❌ Gradle check result for 4448d93: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Feb 4, 2026

✅ Gradle check result for 4448d93: SUCCESS

@varunbharadwaj varunbharadwaj merged commit 839254d into opensearch-project:main Feb 4, 2026
37 of 40 checks passed
tanyabti pushed a commit to tanyabti/OpenSearch that referenced this pull request Feb 24, 2026
…oject#20532)

* support expected remote cluster name in CCS sniff mode

Signed-off-by: Varun Bharadwaj <varunbharadwaj1995@gmail.com>

* rename expected_cluster_name entry to just cluster_name

Signed-off-by: Varun Bharadwaj <varunbharadwaj1995@gmail.com>

* skip checking remote cluster name when expected cluster name is provided

Signed-off-by: Varun Bharadwaj <varunbharadwaj1995@gmail.com>

---------

Signed-off-by: Varun Bharadwaj <varunbharadwaj1995@gmail.com>
tanyabti pushed a commit to tanyabti/OpenSearch that referenced this pull request Feb 24, 2026
…oject#20532)

* support expected remote cluster name in CCS sniff mode

Signed-off-by: Varun Bharadwaj <varunbharadwaj1995@gmail.com>

* rename expected_cluster_name entry to just cluster_name

Signed-off-by: Varun Bharadwaj <varunbharadwaj1995@gmail.com>

* skip checking remote cluster name when expected cluster name is provided

Signed-off-by: Varun Bharadwaj <varunbharadwaj1995@gmail.com>

---------

Signed-off-by: Varun Bharadwaj <varunbharadwaj1995@gmail.com>
sakrah pushed a commit to sakrah/OpenSearch that referenced this pull request Mar 5, 2026
…oject#20532)

* support expected remote cluster name in CCS sniff mode

Signed-off-by: Varun Bharadwaj <varunbharadwaj1995@gmail.com>

* rename expected_cluster_name entry to just cluster_name

Signed-off-by: Varun Bharadwaj <varunbharadwaj1995@gmail.com>

* skip checking remote cluster name when expected cluster name is provided

Signed-off-by: Varun Bharadwaj <varunbharadwaj1995@gmail.com>

---------

Signed-off-by: Varun Bharadwaj <varunbharadwaj1995@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants