Introduce ConcurrentQueryProfiler to profile query using concurrent segment search path and support concurrency during rewrite and create weight#10352
Conversation
ad5028a to
d0a5071
Compare
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
Compatibility status:Checks if related components are compatible with change 47a5802 Incompatible componentsSkipped componentsCompatible componentsCompatible components: [https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/reporting.git] |
Gradle Check (Jenkins) Run Completed with:
|
This is a known issue #10006 |
server/src/main/java/org/opensearch/search/profile/query/QueryProfiler.java
Show resolved
Hide resolved
server/src/main/java/org/opensearch/search/profile/query/ConcurrentQueryProfiler.java
Show resolved
Hide resolved
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
|
@ticheng-aws I think the APIs are much cleaner now with separate profilers, I have only minor comment to cleanup the AbstractQueryProfileTree::stopAndAddRewriteTime a bit (if it makes sense) but LGTM otherwise, thank you! |
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
|
@ticheng-aws could you please rebase? thank you |
|
@sohami mind please rechecking this pull request? thank you. |
…current segment search (opensearch-project#10352) Signed-off-by: Ticheng Lin <ticheng@amazon.com>
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
|
server/src/main/java/org/opensearch/search/profile/query/ConcurrentQueryProfileBreakdown.java
Show resolved
Hide resolved
Gradle Check (Jenkins) Run Completed with:
|
Signed-off-by: Ticheng Lin <ticheng@amazon.com>
Gradle Check (Jenkins) Run Completed with:
|
|
The backport to To backport manually, run these commands in your terminal: # Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/OpenSearch/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/OpenSearch/backport-2.x
# Create a new branch
git switch --create backport/backport-10352-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 200ad5d28a577877be530ecab507601898025c5c
# Push it to GitHub
git push --set-upstream origin backport/backport-10352-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/OpenSearch/backport-2.xThen, create a pull request where the |
|
@ticheng-aws sadly backport to 2.x failed, could you please send a manual pull request? thank you |
@ticheng-aws We will need to backport all the previous PRs for concurrent profiler as well. I think during 2.11 we didn't do it. But lets do it now. Then it will go thru for this one as well. |
Oh wait , is it because we run into NPEs? |
You are right. We need to backport #9248, #10111, #10547, and the current one #10352. |
Description
The
rewrite_timefield contains the cumulative total rewrite time for a query. In the concurrent search case, the issue is “rewrite” happened on "before" (i.e. non concurrent) and "during" the concurrent path. There is only 1 rewrite timer shared across all slices, which lead the racing condition happened in the multiple threadings case.To resolve the issue, we need to have a distinct rewrite timer for each query, then compute the sum of the non-overlapping time across all slices and add the rewrite time from the non-concurrent path.
A similar race condition happened on building the profile tree stack during the create weight step. To address this issue, I added
threadToProfileTreemap into the profiler. This way, each thread has its own profile tree, ensuring that the stack won't access tree nodes from other threads. In the end, we simply combine all the trees into theprofileResultslist.Related Issues
Resolves #9787
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.