Skip to content

search_after query optimization with shard/segment level short cutting#7453

Merged
dblock merged 22 commits intoopensearch-project:mainfrom
gashutos:main-search-after
May 23, 2023
Merged

search_after query optimization with shard/segment level short cutting#7453
dblock merged 22 commits intoopensearch-project:mainfrom
gashutos:main-search-after

Conversation

@gashutos
Copy link
Copy Markdown
Contributor

@gashutos gashutos commented May 5, 2023

Description

search_after queries are significantly slower on asc/desc order sort. Like for http_logs OSB workload, vanilla ASC order sort queries takes like 11 ms while same query just with search_after parameter takes 1703 ms.

eg. below query takes 1703 ms, but once we remove search_after, it just takes 11 ms.

GET logs-*/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "@timestamp": {
        "order": "asc"
      }
    }
  ],
  "search_after": ["1998-06-10"]
}

Reason

The reason is, in time series kind of workloads, there will be segments or shards, which doesnt contain any result hits comparing to given search_after time parameter. There is a bug in Lucene, where if entire segment doesnt contain any competitive hit given search_after parameter, it is not able to skip any hit, since it is not able to update its competitive score.
Because TopFieldCollector.java#250, the queuefull variable will never able to set true because we didnt collect any competitive hit. I will file seperate issue on lucene and planning to fix it on lucene side as well.
But from OpenSearch perspective, it makes sense to not to issue any call to Segment read to Lucene in cases we determine if its min/max value for primary sort field is not within range of given search_after paramter. Similar logic stands true at shard level as well.

Changes in this CR

  1. Shard level canMatch short cutting if MinMax for primary sort field is out of range for search_after
  2. Segment level canMatch short cutting if MinMax for primary sort field is out of range for search_after
  3. Moving shouldReverseSegmentReadOrder function to ContextIndexSearcher since now we are passing DefaultSearchContext object to ContextIndexSearcher.

Performance gains

I am observing, we are gaining 10x to 60x depending on dataset with this change for ASC order sorting and around 3x for DESC order sorting. We will end up saving lot of unnecessary computation with short cutting shard/segments in canMatch phase itself. Again, the optimization multiplier will depend on data set , but this change is for sure optimize something if not equal performance, but definitely wont regress. I verified on nyc_taxy and other type of workloads.

This work load is used (http_logs)
The operation was performed on single index size of 12.9 G with 4 shards with 136 segments on it.

Sort Type hits = 5
ASC 1703
ASC (with this change) 23
Desc 163
Desc (with this change) 51

Testing

  1. Added unit tests for canMatchSearchAfter.

Related Issues

6596

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented May 5, 2023

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented May 5, 2023

Gradle Check (Jenkins) Run Completed with:

@gashutos gashutos force-pushed the main-search-after branch from 1a93d43 to 20b8dcc Compare May 6, 2023 11:46
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented May 6, 2023

Gradle Check (Jenkins) Run Completed with:

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 6, 2023

Codecov Report

Merging #7453 (9a5abc5) into main (959910b) will decrease coverage by 0.06%.
The diff coverage is 70.37%.

❗ Current head 9a5abc5 differs from pull request most recent head 1ad20ee. Consider uploading reports for the commit 1ad20ee to get more accurate results

@@             Coverage Diff              @@
##               main    #7453      +/-   ##
============================================
- Coverage     70.68%   70.62%   -0.06%     
- Complexity    56095    59846    +3751     
============================================
  Files          4680     4897     +217     
  Lines        266076   286967   +20891     
  Branches      39065    41366    +2301     
============================================
+ Hits         188068   202668   +14600     
- Misses        62101    67561    +5460     
- Partials      15907    16738     +831     
Impacted Files Coverage Δ
...va/org/opensearch/search/DefaultSearchContext.java 78.92% <ø> (+2.03%) ⬆️
...ensearch/search/internal/ContextIndexSearcher.java 72.50% <45.00%> (-3.56%) ⬇️
...main/java/org/opensearch/search/SearchService.java 71.86% <70.58%> (-0.03%) ⬇️
...a/org/opensearch/search/sort/FieldSortBuilder.java 67.21% <100.00%> (+0.10%) ⬆️
...ain/java/org/opensearch/search/sort/MinAndMax.java 100.00% <100.00%> (ø)

... and 758 files with indirect coverage changes

@gashutos gashutos force-pushed the main-search-after branch from 20b8dcc to f510edc Compare May 6, 2023 14:57
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented May 6, 2023

Gradle Check (Jenkins) Run Completed with:

@gashutos
Copy link
Copy Markdown
Contributor Author

gashutos commented May 8, 2023

@nknize @reta - if you guys can take a look at this small change. This is benefiting a lot to search_after queries (not just timeseries based), but optimizing timeseries based queries by multifold

@gashutos gashutos force-pushed the main-search-after branch from f510edc to 1e2c79a Compare May 8, 2023 09:28
@github-actions
Copy link
Copy Markdown
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: null ❌
  • URL: null
  • CommitID: 4b162f7
    Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green.
    Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

Gradle Check (Jenkins) Run Completed with:

@gashutos
Copy link
Copy Markdown
Contributor Author

Opened an issue to fix test
#7679 7679

@github-actions
Copy link
Copy Markdown
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Copy Markdown
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Copy Markdown
Contributor

Gradle Check (Jenkins) Run Completed with:

Copy link
Copy Markdown
Member

@dblock dblock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me (I tried to understand what's going on 😅 ), let's merge @nknize yell if you see something off.

Copy link
Copy Markdown
Member

@dblock dblock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me (I tried to understand what's going on 😅 ), let's merge @nknize yell if you see something off.

@dblock dblock merged commit c659d04 into opensearch-project:main May 23, 2023
@dblock dblock added the backport 2.x Backport to 2.x branch label May 23, 2023
@opensearch-trigger-bot
Copy link
Copy Markdown
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-7453-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 c659d04956e300740a7626f952bb7958492ed843
# Push it to GitHub
git push --set-upstream origin backport/backport-7453-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-7453-to-2.x.

gashutos added a commit to gashutos/OpenSearch that referenced this pull request May 24, 2023
opensearch-project#7453)

* search_after query optimization with shard/segment level short cutting

Signed-off-by: Chaitanya Gohel <gashutos@amazon.com>

* Correcting logical && operator

Signed-off-by: Chaitanya Gohel <gashutos@amazon.com>

* Addressig review comments

Signed-off-by: Chaitanya Gohel <gashutos@amazon.com>

* Correcting NPE

Signed-off-by: gashutos <gashutos@amazon.com>

* Fixing failed integ

Signed-off-by: gashutos <gashutos@amazon.com>

* Addressing review comments and fixing some more integ

Signed-off-by: Chaitanya Gohel <gashutos@amazon.com>

* Addressing review comments and fixing some more integ

Signed-off-by: Chaitanya Gohel <gashutos@amazon.com>

* Refactoring a bit to aggressive null checks

Signed-off-by: Chaitanya Gohel <gashutos@amazon.com>

* Adding unit tests for MinAndMax methods

Signed-off-by: gashutos <gashutos@amazon.com>

---------

Signed-off-by: Chaitanya Gohel <gashutos@amazon.com>
Signed-off-by: gashutos <gashutos@amazon.com>
Signed-off-by: Chaitanya Gohel <104654647+gashutos@users.noreply.github.com>
andrross pushed a commit that referenced this pull request May 24, 2023
#7453) (#7727)

* search_after query optimization with shard/segment level short cutting



* Correcting logical && operator



* Addressig review comments



* Correcting NPE



* Fixing failed integ



* Addressing review comments and fixing some more integ



* Addressing review comments and fixing some more integ



* Refactoring a bit to aggressive null checks



* Adding unit tests for MinAndMax methods



---------

Signed-off-by: Chaitanya Gohel <gashutos@amazon.com>
Signed-off-by: gashutos <gashutos@amazon.com>
Signed-off-by: Chaitanya Gohel <104654647+gashutos@users.noreply.github.com>
@nknize
Copy link
Copy Markdown
Contributor

nknize commented May 24, 2023

There is a bug in Lucene, where if entire segment doesnt contain any competitive hit given search_after parameter, it is not able to skip any hit, since it is not able to update its competitive score.

I'm just getting around to this... @gashutos can you give an example query from the http_logs benchmark above? Did you _profile the query and inspect the rewritten lucene query? Can you post the rewritten query as well?

@gashutos
Copy link
Copy Markdown
Contributor Author

gashutos commented May 24, 2023

There is a bug in Lucene, where if entire segment doesnt contain any competitive hit given search_after parameter, it is not able to skip any hit, since it is not able to update its competitive score.

I'm just getting around to this... @gashutos can you give an example query from the http_logs benchmark above? Did you _profile the query and inspect the rewritten lucene query? Can you post the rewritten query as well?

Hey @nknize ,

Below is the exact http_logs query for which data is present only in few indices & segmetns.

GET logs-*/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "@timestamp": {
        "order": "asc"
      }
    }
  ],
  "search_after": ["1998-06-10"]
}

For profiling purpose, I am querying only on one shard as mentioned below. (No data after date 1999-06-10 for asc order)

GET logs-241998/_search?preference=_shards:3&size=1000
{
  "profile": true,
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "@timestamp": {
        "order": "asc"
      }
    }
  ],
    "search_after": ["1999-06-10"]
}

below is the output,

{
  "took": 3935,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 10000,
      "relation": "gte"
    },
    "max_score": null,
    "hits": []
  },
  "profile": {
    "shards": [
      {
        "id": "[AckqPzYSS_Ghj9OAUWUbqQ][logs-241998][3]",
        "inbound_network_time_in_millis": 0,
        "outbound_network_time_in_millis": 0,
        "searches": [
          {
            "query": [
              {
                "type": "ConstantScoreQuery",
                "description": "ConstantScore(*:*)",
                "time_in_nanos": 3872295057,
                "breakdown": {
                  "set_min_competitive_score_count": 0,
                  "match_count": 0,
                  "shallow_advance_count": 0,
                  "set_min_competitive_score": 0,
                  "next_doc": 3870989179,
                  "match": 0,
                  "next_doc_count": 36298576,
                  "score_count": 0,
                  "compute_max_score_count": 0,
                  "compute_max_score": 0,
                  "advance": 2576,
                  "advance_count": 1,
                  "score": 0,
                  "build_scorer_count": 2,
                  "create_weight": 310572,
                  "shallow_advance": 0,
                  "create_weight_count": 1,
                  "build_scorer": 992730
                },
                "children": [
                  {
                    "type": "MatchAllDocsQuery",
                    "description": "*:*",
                    "time_in_nanos": 1251142659,
                    "breakdown": {
                      "set_min_competitive_score_count": 0,
                      "match_count": 0,
                      "shallow_advance_count": 0,
                      "set_min_competitive_score": 0,
                      "next_doc": 1251110233,
                      "match": 0,
                      "next_doc_count": 36298576,
                      "score_count": 0,
                      "compute_max_score_count": 0,
                      "compute_max_score": 0,
                      "advance": 1099,
                      "advance_count": 1,
                      "score": 0,
                      "build_scorer_count": 2,
                      "create_weight": 4644,
                      "shallow_advance": 0,
                      "create_weight_count": 1,
                      "build_scorer": 26683
                    }
                  }
                ]
              }
            ],
            "rewrite_time": 12734,
            "collector": [
              {
                "name": "PagingFieldCollector",
                "reason": "search_top_hits",
                "time_in_nanos": 2039165015
              }
            ]
          }
        ],
        "aggregations": []
      }
    ]
  }
}

The above query takes 963 ms without changes in above PR. With changes in above PR, it takes just 5 ms (short circuited by shard level minMac guard itself)

@gashutos
Copy link
Copy Markdown
Contributor Author

gashutos commented May 24, 2023

if I strictly speak about vanilla http_logs improvements, asc_sort_after_timestamp (the query mentioedn above) used to take 750 ms while with above changes, it takes only 50 ms.

GET logs-*/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "@timestamp": {
        "order": "asc"
      }
    }
  ],
  "search_after": ["1998-06-10"]
}

This is the query, same is true for desc order too.

dblock pushed a commit to dblock/OpenSearch that referenced this pull request May 25, 2023
opensearch-project#7453)

* search_after query optimization with shard/segment level short cutting

Signed-off-by: Chaitanya Gohel <gashutos@amazon.com>

* Correcting logical && operator

Signed-off-by: Chaitanya Gohel <gashutos@amazon.com>

* Addressig review comments

Signed-off-by: Chaitanya Gohel <gashutos@amazon.com>

* Correcting NPE

Signed-off-by: gashutos <gashutos@amazon.com>

* Fixing failed integ

Signed-off-by: gashutos <gashutos@amazon.com>

* Addressing review comments and fixing some more integ

Signed-off-by: Chaitanya Gohel <gashutos@amazon.com>

* Addressing review comments and fixing some more integ

Signed-off-by: Chaitanya Gohel <gashutos@amazon.com>

* Refactoring a bit to aggressive null checks

Signed-off-by: Chaitanya Gohel <gashutos@amazon.com>

* Adding unit tests for MinAndMax methods

Signed-off-by: gashutos <gashutos@amazon.com>

---------

Signed-off-by: Chaitanya Gohel <gashutos@amazon.com>
Signed-off-by: gashutos <gashutos@amazon.com>
Signed-off-by: Chaitanya Gohel <104654647+gashutos@users.noreply.github.com>
suranjay pushed a commit to suranjay/OpenSearch that referenced this pull request May 29, 2023
opensearch-project#7453)

* search_after query optimization with shard/segment level short cutting

Signed-off-by: Chaitanya Gohel <gashutos@amazon.com>

* Correcting logical && operator

Signed-off-by: Chaitanya Gohel <gashutos@amazon.com>

* Addressig review comments

Signed-off-by: Chaitanya Gohel <gashutos@amazon.com>

* Correcting NPE

Signed-off-by: gashutos <gashutos@amazon.com>

* Fixing failed integ

Signed-off-by: gashutos <gashutos@amazon.com>

* Addressing review comments and fixing some more integ

Signed-off-by: Chaitanya Gohel <gashutos@amazon.com>

* Addressing review comments and fixing some more integ

Signed-off-by: Chaitanya Gohel <gashutos@amazon.com>

* Refactoring a bit to aggressive null checks

Signed-off-by: Chaitanya Gohel <gashutos@amazon.com>

* Adding unit tests for MinAndMax methods

Signed-off-by: gashutos <gashutos@amazon.com>

---------

Signed-off-by: Chaitanya Gohel <gashutos@amazon.com>
Signed-off-by: gashutos <gashutos@amazon.com>
Signed-off-by: Chaitanya Gohel <104654647+gashutos@users.noreply.github.com>
stephen-crawford pushed a commit to stephen-crawford/OpenSearch that referenced this pull request May 31, 2023
opensearch-project#7453)

* search_after query optimization with shard/segment level short cutting

Signed-off-by: Chaitanya Gohel <gashutos@amazon.com>

* Correcting logical && operator

Signed-off-by: Chaitanya Gohel <gashutos@amazon.com>

* Addressig review comments

Signed-off-by: Chaitanya Gohel <gashutos@amazon.com>

* Correcting NPE

Signed-off-by: gashutos <gashutos@amazon.com>

* Fixing failed integ

Signed-off-by: gashutos <gashutos@amazon.com>

* Addressing review comments and fixing some more integ

Signed-off-by: Chaitanya Gohel <gashutos@amazon.com>

* Addressing review comments and fixing some more integ

Signed-off-by: Chaitanya Gohel <gashutos@amazon.com>

* Refactoring a bit to aggressive null checks

Signed-off-by: Chaitanya Gohel <gashutos@amazon.com>

* Adding unit tests for MinAndMax methods

Signed-off-by: gashutos <gashutos@amazon.com>

---------

Signed-off-by: Chaitanya Gohel <gashutos@amazon.com>
Signed-off-by: gashutos <gashutos@amazon.com>
Signed-off-by: Chaitanya Gohel <104654647+gashutos@users.noreply.github.com>
@dblock
Copy link
Copy Markdown
Member

dblock commented Jun 30, 2023

Looks like this may have introduced a regression between 2.7 and 2.8, #8212.

shiv0408 pushed a commit to Gaurav614/OpenSearch that referenced this pull request Apr 25, 2024
opensearch-project#7453)

* search_after query optimization with shard/segment level short cutting

Signed-off-by: Chaitanya Gohel <gashutos@amazon.com>

* Correcting logical && operator

Signed-off-by: Chaitanya Gohel <gashutos@amazon.com>

* Addressig review comments

Signed-off-by: Chaitanya Gohel <gashutos@amazon.com>

* Correcting NPE

Signed-off-by: gashutos <gashutos@amazon.com>

* Fixing failed integ

Signed-off-by: gashutos <gashutos@amazon.com>

* Addressing review comments and fixing some more integ

Signed-off-by: Chaitanya Gohel <gashutos@amazon.com>

* Addressing review comments and fixing some more integ

Signed-off-by: Chaitanya Gohel <gashutos@amazon.com>

* Refactoring a bit to aggressive null checks

Signed-off-by: Chaitanya Gohel <gashutos@amazon.com>

* Adding unit tests for MinAndMax methods

Signed-off-by: gashutos <gashutos@amazon.com>

---------

Signed-off-by: Chaitanya Gohel <gashutos@amazon.com>
Signed-off-by: gashutos <gashutos@amazon.com>
Signed-off-by: Chaitanya Gohel <104654647+gashutos@users.noreply.github.com>
Signed-off-by: Shivansh Arora <hishiv@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport 2.x Backport to 2.x branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants