Add performance improvement blog#2522
Add performance improvement blog#2522nateynateynate merged 22 commits intoopensearch-project:mainfrom
Conversation
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
|
|
||
| * Queries for ascending and descending sort-after-timestamp saw a significant performance improvement of up to 70x overall. The optimizations introduced (such as [#6424](https://github.com/opensearch-project/OpenSearch/pull/6424) and [#8167](https://github.com/opensearch-project/OpenSearch/issues/8167)) extend across various numeric types, including but not limited to `int`, `short`, `float`, `double`, `date`, and others. | ||
|
|
||
| * Other popular queries such as `search_after` saw an about 60x reduction in latency, attributed to the improvements made in the area involving optimally skipping segments during search (see [#7453](https://github.com/opensearch-project/OpenSearch/pull/7453)). The `search_after` queries can be used as the recommended alternative to scroll queries for a better search experience. |
There was a problem hiding this comment.
Can we add a line item after:
- Implementation support for match_only_text field to optimize on storage and indexing/search latency for text queries is in progress (#11039).
|
|
||
| * Hourly aggregations and multi-term aggregations also demonstrated improvement, varying from 5% to 35%, attributed to similar time-series improvements discussed previously. | ||
|
|
||
| * `date_histograms` and `date_histogram_agg` queries exhibited either comparable or slightly decreased performance, ranging from 5% to around 20% in multi-node environments. These issues are actively being addressed as part of the ongoing project efforts (see [#11083](https://github.com/opensearch-project/OpenSearch/pull/11083)). |
There was a problem hiding this comment.
Also a line item under Time Series:
- For the date histogram aggregations, there are upcoming changes aiming to improve the performance by rounding-down dates to the nearest interval (such as year, quarter, month, week, day) using SIMD (#11194).
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
|
|
||
| OpenSearch is a community-driven, open source search and analytics suite used by developers to ingest, search, visualize, and analyze data. [Introduced in January 2021](https://aws.amazon.com/blogs/opensource/stepping-up-for-a-truly-open-source-elasticsearch/), the OpenSearch Project originated as an open source fork of Elasticsearch 7.10.2. OpenSearch 1.0 was released for production usage in [July 2021](https://opensearch.org/blog/opensearch-general-availability-announcement/) and is licensed under the [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0) (ALv2), with the complete codebase [published to GitHub](https://github.com/opensearch-project). The project has consistently focused on improving performance of its core open source engine for high volume indexing and low latency search operations. OpenSearch aims to provide the best experience for every user through driving down latency and improving efficiency. | ||
|
|
||
| In this blog, we'll share a comprehensive view of strategic enhancements and features in performance that OpenSearch has delivered to date. Additionally, we'll provide a forward look at the [planned roadmap](https://github.com/orgs/opensearch-project/projects/153/views/1) of improvements in open source. We'll compare the core engine performance of the latest OpenSearch version (OpenSearch 2.11) to the state just before the OpenSearch fork, Elasticsearch 7.10.2. We'll highlight continuous advancements made in the OpenSearch core engine, ongoing feature enhancements centered around the popular log analytics and search use cases, and plans to drive improvements for which we are seeking community collaboration. |
There was a problem hiding this comment.
Suggesting a below rewrite of this paragraph, also attributing the community for advancements:
"In this blog, we'll share a comprehensive view of strategic enhancements and features in performance that OpenSearch has delivered to date. Additionally, we'll provide a forward look at the planned roadmap of improvements in open source. We’ll compare the core engine performance of the latest OpenSearch version (OpenSearch 2.11) with a specific focus on its advancements, to the state just before the OpenSearch fork. For this purpose, we have chosen Elasticsearch 7.10.2 to represent the baseline where OpenSearch was forked from, allowing us to measure all changes that were delivered after the fork (OpenSearch 1.0-2.11). These progressions were realized through collaborative efforts with the community, and OpenSearch is actively seeking to enhance community engagement, specifically in the field of improving performance."
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
| categories: | ||
| - technical-posts | ||
| - community | ||
| meta_keywords: |
There was a problem hiding this comment.
Please add the following meta:
meta_keywords: OpenSearch performance improvements, OpenSearch roadmap, high volume indexing, low latency search
meta_description: Learn more about the OpenSearch Project roadmap and how the project improved the performance of its core open source engine to drive down latency and improve efficiency.
natebower
left a comment
There was a problem hiding this comment.
@kolchfa-aws @getsaurabh02 Editorial review complete. Please see my comments and changes and let me know if you have any questions. Thanks!
_authors/sisurab.markdown
Outdated
| --- | ||
|
|
||
| **Saurabh** is a Senior Software Engineer working on OpenSearch at Amazon Web Services. He is passionate about solving problems in the large-scale distributed systems. He is an active contributor to OpenSearch. | ||
| **Saurabh Singh** is an Engineering Lead working on OpenSearch at Amazon Web Services, leading the core search performance space. He is passionate about solving problems in the large-scale distributed systems. He is an active OpenSearch contributor. |
There was a problem hiding this comment.
Is the last sentence necessary here?
|
|
||
| **Setup**: OpenSearch 2.11.0 single node (r5.2xlarge) with 64 GB RAM and 32 GB heap. Index settings: 1 shard and 0 replicas. | ||
|
|
||
| **`nyc_taxis` workload results:** The following table illustrates a benchmark comparison of the `nyc_taxis` workload for OpenSearch 2.11 with concurrent search disabled and enabled (with 0 slices and with 4 slices). It includes the 90th percentile of `took` time latency measurements for each (p90) and the observed percentage improvements. |
There was a problem hiding this comment.
| **`nyc_taxis` workload results:** The following table illustrates a benchmark comparison of the `nyc_taxis` workload for OpenSearch 2.11 with concurrent search disabled and enabled (with 0 slices and with 4 slices). It includes the 90th percentile of `took` time latency measurements for each (p90) and the observed percentage improvements. | |
| **`nyc_taxis` workload results**: The following table provides a benchmark comparison of the `nyc_taxis` workload for OpenSearch 2.11 with concurrent search disabled and enabled (with 0 slices and with 4 slices). It includes the 90th percentile of `took` time latency measurements for each (p90) and the observed percentage improvements. |
| </tr> | ||
| </table> | ||
|
|
||
| **`http_logs` workload results:** The following table illustrates a benchmark comparison of the `http_logs` workload for OpenSearch 2.11 with concurrent search disabled and enabled (with 0 slices and with 4 slices). It includes the 90th percentile of latency measurements for each (p90) and the observed percentage improvements. |
There was a problem hiding this comment.
| **`http_logs` workload results:** The following table illustrates a benchmark comparison of the `http_logs` workload for OpenSearch 2.11 with concurrent search disabled and enabled (with 0 slices and with 4 slices). It includes the 90th percentile of latency measurements for each (p90) and the observed percentage improvements. | |
| **`http_logs` workload results**: The following table provides a benchmark comparison of the `http_logs` workload for OpenSearch 2.11 with concurrent search disabled and enabled (with 0 slices and with 4 slices). It includes the 90th percentile of latency measurements for each (p90) and the observed percentage improvements. |
|
|
||
| * * * | ||
|
|
||
| *We would like to take this opportunity to thank the OpenSearch core developers for their contributions to the technical roadmap. We sincerely appreciate all the suggestions from Michael Froh, Andriy Redko, Jonah Kowall, Amitai Stern, Jon Handler, Prabhakar Sithanandam, Mike McCandles, Anandhi Bumstead, Eli Fisher, Carl Meadows, and Mukul Karnik towards writing this blog. Credits to Fanit Kolchina and Nathan Bower for editing and Carlos Canas for creating the graphics.* No newline at end of file |
There was a problem hiding this comment.
| *We would like to take this opportunity to thank the OpenSearch core developers for their contributions to the technical roadmap. We sincerely appreciate all the suggestions from Michael Froh, Andriy Redko, Jonah Kowall, Amitai Stern, Jon Handler, Prabhakar Sithanandam, Mike McCandles, Anandhi Bumstead, Eli Fisher, Carl Meadows, and Mukul Karnik towards writing this blog. Credits to Fanit Kolchina and Nathan Bower for editing and Carlos Canas for creating the graphics.* | |
| *We would like to take this opportunity to thank the OpenSearch core developers for their contributions to the technical roadmap. We sincerely appreciate all the suggestions from Michael Froh, Andriy Redko, Jonah Kowall, Amitai Stern, Jon Handler, Prabhakar Sithanandam, Mike McCandles, Anandhi Bumstead, Eli Fisher, Carl Meadows, and Mukul Karnik in writing this blog post. Credits to Fanit Kolchina and Nathan Bower for editing and Carlos Canas for creating the graphics.* |
|
|
||
| * * * | ||
|
|
||
| *We would like to take this opportunity to thank the OpenSearch core developers for their contributions to the technical roadmap. We sincerely appreciate all the suggestions from Michael Froh, Andriy Redko, Jonah Kowall, Amitai Stern, Jon Handler, Prabhakar Sithanandam, Mike McCandles, Anandhi Bumstead, Eli Fisher, Carl Meadows, and Mukul Karnik towards writing this blog. Credits to Fanit Kolchina and Nathan Bower for editing and Carlos Canas for creating the graphics.* No newline at end of file |
There was a problem hiding this comment.
Confirm that "McCandles" shouldn't be "McCandless".
|
|
||
| * An increase in performance with aggregate queries on workloads such as `nyc_taxis` workload, showcasing an improvement ranging between 50% to 70% over the default configuration. | ||
| * The log analytics use cases for range queries demonstrated an improvement of around 65%. | ||
| * Aggregation queries with hourly data aggregations, such as those for the `http_logs` `hourly_agg` workload, demonstrated a boost of up to 50% in performance. |
There was a problem hiding this comment.
"such as those for the hourly_agg operation on the http_logs workload" (hourly_agg is not a workload)?
There was a problem hiding this comment.
@getsaurabh02 Could you confirm that we can make this change?
There was a problem hiding this comment.
looks good to me, OR "such as those for the http_logs workload"
yes - hourly_agg is not a workload)
Co-authored-by: Nathan Bower <nbower@amazon.com> Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Co-authored-by: Nathan Bower <nbower@amazon.com> Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
|
@pajuric The blog is ready to publish. Thanks! |
|
@nateynateynate @dtaivpp @krisfreedain - This blog is ready to push live. If possible, can we get this out by 12PM PST today, please. |
nateynateynate
left a comment
There was a problem hiding this comment.
Looks good to me! Great job!
|
|
||
| ## Appendix: Detailed execution and results | ||
|
|
||
| If you're interested in the details of the performance benchmarks we used, exploring the methodologies behind their execution, or examining the comprehensive results, keep reading. For OpenSearch users interested in establishing benchmarks and replicating these runs, we've provided comprehensive setup details alongside each result. This section provides the core engine performance comparison between the latest OpenSearch version (OpenSearch 2.11) and the state just before the OpenSearch fork, Elasticsearch 7.10.2, with a mid-point performance measurement on OpenSearch 2.3. |
There was a problem hiding this comment.
Can we add a line at the end of this paragraph:
"Also, we've identified items in the performance roadmap that require active enhancements due to observed regressions in specific areas."
There was a problem hiding this comment.
ohh looks like its merged! 👍
Description
Adds the OpenSearch performance improvements blog
Issues Resolved
Closes #2477
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the BSD-3-Clause License.