Add support for local cluster alias to SearchRequest#36997
Merged
javanna merged 9 commits intoelastic:masterfrom Dec 28, 2018
Merged
Add support for local cluster alias to SearchRequest#36997javanna merged 9 commits intoelastic:masterfrom
javanna merged 9 commits intoelastic:masterfrom
Conversation
With the upcoming cross-cluster search alternate execution mode, the CCS node will be able to split a CCS request into multiple search requests, one per remote cluster involved. In order to do that, the CCS node has to be able to signal to each remote cluster that such sub-requests are part of a CCS request. Each cluster does not know about the other clusters involved, and does not know either what alias it is given in the CCS node, hence the CCS coordinating node needs to be able to provide the alias as part of the search request so that it is used as index prefix in the returned search hits. The cluster alias is a notion that's already supported in the search shards iterator and search shard target, but it is currently used in CCS as both index prefix and connection lookup key when fanning out to all the shards. With CCS alternate execution mode the provided cluster alias needs to be used only as index prefix, as shards are local to each cluster hence no cluster alias should be used for connection lookups. Such distinction is introduced through a new class called CCSInfo that replaces the previous cluster alias optional string and allows to set the cluster alias as well as a boolean flag that signals the current CCS execution mode. The cluster alias can be set to the SearchRequest at the transport layer only, and its getter/setter methods are package private. Relates to elastic#32125
Collaborator
|
Pinging @elastic/es-search |
11 tasks
Simplify changes by making connection lookup aware of the search request cluster alias, so there is no need to distinguish between index prefix and connection alias, which means no serialization changes required besides adding support for cluster alias to the search request.
jimczi
approved these changes
Dec 28, 2018
Contributor
jimczi
left a comment
There was a problem hiding this comment.
I left a minor comment regarding naming, LGTM otherwise
| public static final int DEFAULT_PRE_FILTER_SHARD_SIZE = 128; | ||
| public static final int DEFAULT_BATCHED_REDUCE_SIZE = 512; | ||
|
|
||
| private final String clusterAlias; |
Contributor
There was a problem hiding this comment.
Maybe rename to localClusterAlias ?
javanna
added a commit
to javanna/elasticsearch
that referenced
this pull request
Dec 28, 2018
With elastic#36997 we added the ability to provide a cluster alias with a SearchRequest. The next step is to disable the final reduction whenever a cluster alias is provided with the SearchRequest. A cluster alias will be provided when executing a cross-cluster search request with alternate execution mode, where each cluster does its own reduction locally. In order for the CCS node to be able to later perform an additional reduction of the results, we need to make sure that all the needed info stays available. This means that terms aggregations can be reduced but not pruned, and pipeline aggs should not be executed. The final reduction will happen later in the CCS coordinating node. Relates to elastic#36997 & elastic#32125
javanna
added a commit
that referenced
this pull request
Dec 28, 2018
With #36997 we added the ability to provide a cluster alias with a SearchRequest. The next step is to disable the final reduction whenever a cluster alias is provided with the SearchRequest. A cluster alias will be provided when executing a cross-cluster search request with alternate execution mode, where each cluster does its own reduction locally. In order for the CCS node to be able to later perform an additional reduction of the results, we need to make sure that all the needed info stays available. This means that terms aggregations can be reduced but not pruned, and pipeline aggs should not be executed. The final reduction will happen later in the CCS coordinating node. Relates to #36997 & #32125
javanna
added a commit
to javanna/elasticsearch
that referenced
this pull request
Dec 28, 2018
With elastic#36997 and elastic#37000 we added the ability to provide a cluster alias with a SearchRequest and to perform non-final reduction when a cluster alias is set to the incoming search request. With CCS soon supporting local reduction on each remote cluster, we also need to prevent shard_size of terms aggs to be adjusted on the data nodes. We may end up querying a single shard per cluster, but later we will have to reduce results coming from multiple cluster hence we still have to collect more buckets than needed to guarantee some level of precision. This is done by adding the ability to override the shard_size on the coordinating node as part of the rewrite phase. This will be done only when searching against multiple clusters, meaning when using CCS with alternate execution mode. Setting the shard_size explicitly on the coord node means that it will not be overridden later on the data nodes. The shard_size will still be set on the data nodes like before in all other cases (local search, or CCS with ordinary execution mode). Relates to elastic#32125
javanna
added a commit
to javanna/elasticsearch
that referenced
this pull request
Jan 3, 2019
With elastic#36997 we added support for providing a local cluster alias with a `SearchRequest`. We intended to make sure that when provided as part of a search request, the cluster alias would never be used for connection lookups. Yet due to a bug we would still end up looking up the connection from the remote ones. This commit adds a test to make sure that whenever we set the cluster alias to the `SearchRequest` (which can only be done at transport), such alias is used as index prefix in the returned hits. No errors are thrown despite no remote clusters are configured indicating that such alias is never used for connection look-ups. Also, we add explicit support for the empty cluster alias when printing out index names through `RemoteClusterAware#buildRemoteIndexName`. In fact we don't want to print out `:index` when the cluster alias is set to empty string, but rather `index`. Yet, the semantic of empty string is different compared to `null` as it will still disable final reduction. This will be used in CCS when searching against remote clusters as well as the local one, the local one will have empty prefix yet it will need to disable final reduction so that its results will be properly merged with the ones coming from the remote clusters.
javanna
added a commit
that referenced
this pull request
Jan 4, 2019
With #36997 we added support for providing a local cluster alias with a `SearchRequest`. We intended to make sure that when provided as part of a search request, the cluster alias would never be used for connection lookups. Yet due to a bug we would still end up looking up the connection from the remote ones. This commit adds a test to make sure that whenever we set the cluster alias to the `SearchRequest` (which can only be done at transport), such alias is used as index prefix in the returned hits. No errors are thrown despite no remote clusters are configured indicating that such alias is never used for connection look-ups. Also, we add explicit support for the empty cluster alias when printing out index names through `RemoteClusterAware#buildRemoteIndexName`. In fact we don't want to print out `:index` when the cluster alias is set to empty string, but rather `index`. Yet, the semantic of empty string is different compared to `null` as it will still disable final reduction. This will be used in CCS when searching against remote clusters as well as the local one, the local one will have empty prefix yet it will need to disable final reduction so that its results will be properly merged with the ones coming from the remote clusters.
javanna
added a commit
to javanna/elasticsearch
that referenced
this pull request
Jan 7, 2019
javanna
added a commit
that referenced
this pull request
Jan 7, 2019
javanna
added a commit
that referenced
this pull request
Jan 7, 2019
With the upcoming cross-cluster search alternate execution mode, the CCS node will be able to split a CCS request into multiple search requests, one per remote cluster involved. In order to do that, the CCS node has to be able to signal to each remote cluster that such sub-requests are part of a CCS request. Each cluster does not know about the other clusters involved, and does not know either what alias it is given in the CCS node, hence the CCS coordinating node needs to be able to provide the alias as part of the search request so that it is used as index prefix in the returned search hits. The cluster alias is a notion that's already supported in the search shards iterator and search shard target, but it is currently used in CCS as both index prefix and connection lookup key when fanning out to all the shards. With CCS alternate execution mode the provided cluster alias needs to be used only as index prefix, as shards are local to each cluster hence no cluster alias should be used for connection lookups. The local cluster alias can be set to the SearchRequest at the transport layer only, and its constructor/getter methods are package private. Relates to #32125
javanna
added a commit
that referenced
this pull request
Jan 7, 2019
With #36997 we added the ability to provide a cluster alias with a SearchRequest. The next step is to disable the final reduction whenever a cluster alias is provided with the SearchRequest. A cluster alias will be provided when executing a cross-cluster search request with alternate execution mode, where each cluster does its own reduction locally. In order for the CCS node to be able to later perform an additional reduction of the results, we need to make sure that all the needed info stays available. This means that terms aggregations can be reduced but not pruned, and pipeline aggs should not be executed. The final reduction will happen later in the CCS coordinating node. Relates to #36997 & #32125
javanna
added a commit
that referenced
this pull request
Jan 7, 2019
With #36997 we added support for providing a local cluster alias with a `SearchRequest`. We intended to make sure that when provided as part of a search request, the cluster alias would never be used for connection lookups. Yet due to a bug we would still end up looking up the connection from the remote ones. This commit adds a test to make sure that whenever we set the cluster alias to the `SearchRequest` (which can only be done at transport), such alias is used as index prefix in the returned hits. No errors are thrown despite no remote clusters are configured indicating that such alias is never used for connection look-ups. Also, we add explicit support for the empty cluster alias when printing out index names through `RemoteClusterAware#buildRemoteIndexName`. In fact we don't want to print out `:index` when the cluster alias is set to empty string, but rather `index`. Yet, the semantic of empty string is different compared to `null` as it will still disable final reduction. This will be used in CCS when searching against remote clusters as well as the local one, the local one will have empty prefix yet it will need to disable final reduction so that its results will be properly merged with the ones coming from the remote clusters.
javanna
added a commit
that referenced
this pull request
Jan 7, 2019
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
With the upcoming cross-cluster search alternate execution mode, the CCS node will be able to split a CCS request into multiple search requests, one per remote cluster involved. In order to do that, the CCS node has to be able to signal to each remote cluster that such sub-requests are part of a CCS request. Each cluster does not know about the other clusters involved, and does not know either what alias it is given in the CCS node, hence the CCS coordinating node needs to be able to provide the alias as part of the search request so that it is used as index prefix in the returned search hits.
The cluster alias is a notion that's already supported in the search shards iterator and search shard target, but it is currently used in CCS as both index prefix and connection lookup key when fanning out to all the shards. With CCS alternate execution mode the provided cluster alias needs to be used only as index prefix, as shards are local to each cluster hence no cluster alias should be used for connection lookups.
The local cluster alias can be set to the SearchRequest at the transport layer only. A new package private constructor is added that takes the cluster alias as an argument, as well as a package private getter method.
Relates to #32125