optimize doc-level monitor workflow for index patterns#1097
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1097 +/- ##
============================================
+ Coverage 67.72% 67.77% +0.05%
Complexity 105 105
============================================
Files 160 160
Lines 10343 10363 +20
Branches 1522 1521 -1
============================================
+ Hits 7005 7024 +19
- Misses 2672 2673 +1
Partials 666 666
|
| monitorCtx.clusterService!!, | ||
| monitorCtx.indexNameExpressionResolver!! | ||
| ) | ||
| val updatedIndexName = indexName.replace("*", "_") |
There was a problem hiding this comment.
why do we do this indexName.replace("*", "_")?
There was a problem hiding this comment.
this is because query string query fields with * have a different meaning & is also not allowed in some cases.
| monitorCtx.indexNameExpressionResolver!! | ||
| ) | ||
| if (concreteIndices.isEmpty()) { | ||
| throw IndexNotFoundException(docLevelMonitorInput.indices[0]) |
There was a problem hiding this comment.
- can we add error log with monitor Id and indices list which were not found
- in exception why are we logging only 0th index element of
indicesvariable? can we log all elements in array/list
There was a problem hiding this comment.
changed it now. it was copied from eariler logic.
| // TODO: If dryrun, we should make it so we limit the search as this could still potentially give us lots of data | ||
| if (isTempMonitor) { | ||
| indexLastRunContext[shard] = max(-1, (indexUpdatedRunContext[shard] as String).toInt() - 10) | ||
| docLevelMonitorInput.indices.forEach { indexName -> |
There was a problem hiding this comment.
why did we change this list being looped from computed concrete indices list to the monitor indices list?
There was a problem hiding this comment.
this is because we want to expand 1 index pattern & process all its concrete indices separately. Previous logic was we expand all index patterns & process all the concrete indices one by one.
|
To create query index mappings for an index pattern would be a very big optimization but it works only under the assumption that an index pattern test* resolves to indices that do not have fields with same names but different data types. this fails in the following scenario: when we store the field mapping in query index for |
|
we can add this optimization for the scenarios where we can identify all indices covered in index pattern are guaranteed to have the same mappings i.e. data streams, rolling indices with index templates. should we store a field in index mapping called |
Since we are batching all mappings can we check if the above mentioned field mapping mismatch is happening for any fields and then ingest one for the generic test* and one mapping for the specific index. Similarly while fetching mappings look for both index mappings with test* and test1. If latter is found prefer it. if not found, use the former. |
hi @eirsep , that is the exact fix i'm working on. |
|
hi @eirsep , just addressed the comment with the test case https://github.com/opensearch-project/alerting/pull/1097/files#diff-9b08d53e8cff3d739beb6ba304e4eaf466f08412762c3cd55886a52b722a77f1R503. |
There was a problem hiding this comment.
the java doc description for this method seems like it should describe adjustMaxFieldLimitForQueryIndex
There was a problem hiding this comment.
can we update the code comments for this method and below method
There was a problem hiding this comment.
NIT; since you are not explicitly creating index with different mappings
can you plz fetch second index's mapping and assert that the field type is actually different for test_field so as to validate the test scenario. that would make the test more readable. right now it's a bit hard to understand.
There was a problem hiding this comment.
specifying explicit mappings now.
There was a problem hiding this comment.
can we plz add asserts on the expected mappings and content of the query index. that would be the essence of these tests and help us visualize the changes made in this PR
There was a problem hiding this comment.
asserts added on actual queries stored in query index.
There was a problem hiding this comment.
this is a very critical piece fo code. can we update java docs description of this method incorporating the new change?
There was a problem hiding this comment.
updated the documentation for the method.
There was a problem hiding this comment.
does the below flatten logic handle both the nested and non-nested type objects correctly?
what would be the difference after flattening into string in the following 2 cases
"nested_field": {
"type": "nested",
"properties": {
"test1": {
"type": "keyword"
}
}
and
"nested_field": {
"properties": {
"test1": {
"type": "keyword"
}
}
There was a problem hiding this comment.
nested fields dont work with query string queries. https://stackoverflow.com/questions/69857071/how-can-i-use-query-string-to-match-both-nested-and-non-nested-fields-at-the-sam
but if we pair them with an index containing an object field, it will work. https://github.com/opensearch-project/alerting/pull/1097/files#diff-9b08d53e8cff3d739beb6ba304e4eaf466f08412762c3cd55886a52b722a77f1R636
There was a problem hiding this comment.
why do we need field name twice?
can you add some code comments around what is the Triple type object being returned to make it more readable?
There was a problem hiding this comment.
this is used intentionally because leafNodeProcessor is used at another place where the 2nd param is actually used to pass modified field name.
Signed-off-by: Subhobrata Dey <sbcd90@gmail.com>
|
The backport to To backport manually, run these commands in your terminal: # Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/alerting/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/alerting/backport-2.x
# Create a new branch
git switch --create backport-1097-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 7f0c7c7e77a9213ad5e976c2f1573321bc26b919
# Push it to GitHub
git push --set-upstream origin backport-1097-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/alerting/backport-2.xThen, create a pull request where the |
…oject#1097) Signed-off-by: Subhobrata Dey <sbcd90@gmail.com> Signed-off-by: Chase Engelbrecht <engechas@amazon.com>
* log error messages and clean up monitor when indexing doc level queries or metadata creation fails (#900) * log errors and clean up monitor when indexing doc level queries or metadata creation fails * refactor delete monitor action to re-use delete methods Signed-off-by: Surya Sashank Nistala <snistala@amazon.com> Signed-off-by: Chase Engelbrecht <engechas@amazon.com> * optimize doc-level monitor workflow for index patterns (#1097) Signed-off-by: Subhobrata Dey <sbcd90@gmail.com> Signed-off-by: Chase Engelbrecht <engechas@amazon.com> * optimize doc-level monitor execution workflow for datastreams (#1302) * optimize doc-level monitor execution for datastreams Signed-off-by: Subhobrata Dey <sbcd90@gmail.com> * add more tests to address comments Signed-off-by: Subhobrata Dey <sbcd90@gmail.com> * add integTest for multiple datastreams inside a single index pattern * add integTest for multiple datastreams inside a single index pattern Signed-off-by: Subhobrata Dey <sbcd90@gmail.com> --------- Signed-off-by: Subhobrata Dey <sbcd90@gmail.com> Signed-off-by: Chase Engelbrecht <engechas@amazon.com> * Bulk index findings and sequentially invoke auto-correlations (#1355) * Bulk index findings and sequentially invoke auto-correlations Signed-off-by: Megha Goyal <goyamegh@amazon.com> * Bulk index findings in batches of 10000 and make it configurable Signed-off-by: Megha Goyal <goyamegh@amazon.com> * Addressing review comments Signed-off-by: Megha Goyal <goyamegh@amazon.com> * Add integ tests to test bulk index findings Signed-off-by: Megha Goyal <goyamegh@amazon.com> * Fix ktlint formatting Signed-off-by: Megha Goyal <goyamegh@amazon.com> --------- Signed-off-by: Megha Goyal <goyamegh@amazon.com> Signed-off-by: Chase Engelbrecht <engechas@amazon.com> * Add jvm aware setting and max num docs settings for batching docs for percolate queries (#1435) * add jvm aware and max docs settings for batching docs for percolate queries Signed-off-by: Surya Sashank Nistala <snistala@amazon.com> * fix stats logging Signed-off-by: Surya Sashank Nistala <snistala@amazon.com> * add queryfieldnames field in findings mapping Signed-off-by: Surya Sashank Nistala <snistala@amazon.com> --------- Signed-off-by: Surya Sashank Nistala <snistala@amazon.com> Signed-off-by: Chase Engelbrecht <engechas@amazon.com> * optimize to fetch only fields relevant to doc level queries in doc level monitor instead of entire _source for each doc (#1441) * optimize to fetch only fields relevant to doc level queries in doc level monitor Signed-off-by: Surya Sashank Nistala <snistala@amazon.com> * fix test for settings check Signed-off-by: Surya Sashank Nistala <snistala@amazon.com> * fix ktlint Signed-off-by: Surya Sashank Nistala <snistala@amazon.com> --------- Signed-off-by: Surya Sashank Nistala <snistala@amazon.com> Signed-off-by: Chase Engelbrecht <engechas@amazon.com> * optimize sequence number calculation and reduce search requests in doc level monitor execution (#1445) * optimize sequence number calculation and reduce search requests by n where n is number of shards being queried in the executino Signed-off-by: Surya Sashank Nistala <snistala@amazon.com> * fix tests Signed-off-by: Surya Sashank Nistala <snistala@amazon.com> * optimize check indices and execute to query only write index of aliases and datastreams during monitor creation Signed-off-by: Surya Sashank Nistala <snistala@amazon.com> * fix test Signed-off-by: Surya Sashank Nistala <snistala@amazon.com> * add javadoc Signed-off-by: Surya Sashank Nistala <snistala@amazon.com> * add tests to verify seq_no calculation Signed-off-by: Surya Sashank Nistala <snistala@amazon.com> --------- Signed-off-by: Surya Sashank Nistala <snistala@amazon.com> Signed-off-by: Chase Engelbrecht <engechas@amazon.com> * Fix tests Signed-off-by: Chase Engelbrecht <engechas@amazon.com> * Fix BWC tests Signed-off-by: Chase Engelbrecht <engechas@amazon.com> * clean up doc level queries on dry run (#1430) Signed-off-by: Joanne Wang <jowg@amazon.com> Signed-off-by: Chase Engelbrecht <engechas@amazon.com> * Fix import Signed-off-by: Chase Engelbrecht <engechas@amazon.com> * Fix tests Signed-off-by: Chase Engelbrecht <engechas@amazon.com> * Fix BWC version Signed-off-by: Chase Engelbrecht <engechas@amazon.com> * Fix another test Signed-off-by: Chase Engelbrecht <engechas@amazon.com> * Revert order of operations change Signed-off-by: Chase Engelbrecht <engechas@amazon.com> --------- Signed-off-by: Subhobrata Dey <sbcd90@gmail.com> Signed-off-by: Chase Engelbrecht <engechas@amazon.com> Signed-off-by: Megha Goyal <goyamegh@amazon.com> Signed-off-by: Surya Sashank Nistala <snistala@amazon.com> Signed-off-by: Joanne Wang <jowg@amazon.com> Co-authored-by: Surya Sashank Nistala <snistala@amazon.com> Co-authored-by: Subhobrata Dey <sbcd90@gmail.com> Co-authored-by: Megha Goyal <56077967+goyamegh@users.noreply.github.com> Co-authored-by: Joanne Wang <jowg@amazon.com>
Description of changes:
This pr addresses following performance problems in
doc-level monitor execution workflow.doc-level monitoris monitoring anindex pattern& a new index is introduced which matches the pattern, thedoc-level monitorduplicates all the field mappings & queries for that index. This is reproducible using integ test https://github.com/opensearch-project/alerting/pull/1097/files#diff-9b08d53e8cff3d739beb6ba304e4eaf466f08412762c3cd55886a52b722a77f1R503Now, say, we have
1000queries & each concrete index behind the index-pattern has 1000 field mappings. Also, lets assume1 concrete indexis generated everyday. We also know thedefault number of field mappingsan index can have is1000& today if the no. of field mappings go over1000in the query index, we rollover the query index.This would mean, we create a new rollover query index everyday & keep on ingesting 1000 queries in it everyday. In 30 days, we create
30 indices(which means by default1 primary & 1 replica shardsper index) which contains30000duplicate queries.This causes the
data nodesto get full resulting in cluster crash.opensearch-project/security-analytics#509
https://forum.opensearch.org/t/security-analytics-error/14639/11
We do not notice this problem for small no. of queries but duplication of queries piles up over a period of time.
This pr addresses this issue by continously updating
1 set of queriesfor all theconcrete indicesbelonging to anindex-pattern. this provides huge storage optimization.concrete index, we make anupdate mappingapi call in no particular order. So, say, indextest1has fieldsf1 & f2&test2has fieldf4& both of them match patterntest*, if we make firstupdate mappingcall fortest1, then query index getsf1 & f2but in the nextupdate mappingcall fortest2we completely overwrite it withf4. This is reproducible using integ test https://github.com/opensearch-project/alerting/pull/1097/files#diff-9b08d53e8cff3d739beb6ba304e4eaf466f08412762c3cd55886a52b722a77f1R551This pr addresses this issue by first collecting field mappings of all concrete indices belonging to an index-pattern together, & then making a
single call to update mappings api.concrete index, we make anupdate mappingapi call in no particular order. So, if there are100concrete indices behind anindex-patternwe make100 update mapping apicalls.This pr optimizes the time complexity by making a
single call to update mappings apifor eachindex pattern.CheckList:
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.