Optimize TransportNodesAction to selectively avoid sending list of Di…#14531
Optimize TransportNodesAction to selectively avoid sending list of Di…#14531Pranshu-S wants to merge 5 commits intoopensearch-project:mainfrom
Conversation
|
❌ Gradle check result for 9e66ed2: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
|
||
| void start() { | ||
| final DiscoveryNode[] nodes = request.concreteNodes(); | ||
| logger.info(request.getClass().getSimpleName()); |
There was a problem hiding this comment.
This would lead to heavy logging. Please remove this.
There was a problem hiding this comment.
Sure Vikas, I intended to work on the logs for testing. Will remove them prior to raising the PR.
| if (request.concreteNodes() == null) { | ||
| resolveRequest(request, clusterService.state()); | ||
| assert request.concreteNodes() != null; | ||
| if (optimizedTransportActionNames.contains(request.getClass().getSimpleName())) { |
There was a problem hiding this comment.
Can you add some tests for this? Also, have you executed all existing tests and made sure that they are passing after your change. If yes, then please confirm this in the PR description.
There was a problem hiding this comment.
Please make sure that node left and node added scenarios are covered properly.
There was a problem hiding this comment.
Yes, I have modified the draft PR approach. Will make sure I add all the necessary tests covering different scenarios.
9e66ed2 to
4f6b6ed
Compare
|
❌ Gradle check result for 4f6b6ed: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
…s, NodesInfo and ClusterStats call Signed-off-by: Pranshu Shukla <pranshushukla06@gmail.com>
4f6b6ed to
d2ef37d
Compare
|
❌ Gradle check result for d2ef37d: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Pranshu Shukla <pranshushukla06@gmail.com>
|
❌ Gradle check result for 469cd6a: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Pranshu Shukla <pranshushukla06@gmail.com>
|
❌ Gradle check result for 03c35df: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Pranshu Shukla <pranshushukla06@gmail.com>
|
❌ Gradle check result for b7b8754: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Pranshu Shukla <pranshushukla06@gmail.com>
|
❌ Gradle check result for 31f030c: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
|
Closing this PR in virtue of clone PR - #14749 |
Description
In the current implementation, every transport action extending TransportNodesAction includes all discovery nodes in the transport request sent to each node in the cluster. This approach leads to performance bottlenecks in large clusters due to redundant data transmission. Specifically:
This PR proposes a solution to address these bottlenecks by selectively skipping the resolution of concrete nodes at the transport layer. This optimization will significantly reduce redundant data transmission and improve overall performance in large clusters.
To do this, flag was introduced on the BaseNodeRequest Class
populateDiscoveryNodesInTransportRequest. Requests in this flag overriden toFalsewill skip adding the concrete nodes data into the request. Currently it is only implemented in NodeStats, ClusterStats and NodeInfo Call from the rest layer.The perf gains are significant (tested in a large 1k node domain)
Related Issues
Resolves #[Issue number to be closed when this PR is merged]
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.