Skip to content

[BUG] A sufficiently small interval value on a histogram can crash the node #14558

@icercel

Description

@icercel

Describe the bug

Provided you have a long field on your index, with extreme min and max values for it, when attempting to return a histogram aggregation on that field using a small interval value, the node instance crashes with OOM.

Related component

Search:Aggregations

To Reproduce

  1. Use the default docker-compose provided on the OS site (it's using :latest, which at the time of writing is 2.15.0)

  2. Add 2 documents

curl -k -XPUT -u "admin:$OPENSEARCH_INITIAL_ADMIN_PASSWORD" \
  'https://localhost:9200/sample-index/_doc/1' \
  -H 'Content-Type: application/json' \
  -d '{"some_value": 1}'

curl -k -XPUT -u "admin:$OPENSEARCH_INITIAL_ADMIN_PASSWORD" \
  'https://localhost:9200/sample-index/_doc/2' \
  -H 'Content-Type: application/json' \
  -d '{"some_value": 1234567890}'
  1. Attempt a histogram with a sufficiently large interval
curl -k -XGET -u "admin:$OPENSEARCH_INITIAL_ADMIN_PASSWORD" \
  'https://localhost:9200/sample-index/_search' \
  -H 'Content-Type: application/json' \
  -d '{"size":0, "aggs": { "test": { "histogram": { "field": "some_value", "interval": 300000000 }}}}'
  1. OpenSearch correctly (i think) returns the buckets:
{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 2,
      "relation": "eq"
    },
    "max_score": null,
    "hits": []
  },
  "aggregations": {
    "test": {
      "buckets": [
        {
          "key": 0,
          "doc_count": 1
        },
        {
          "key": 300000000,
          "doc_count": 0
        },
        {
          "key": 600000000,
          "doc_count": 0
        },
        {
          "key": 900000000,
          "doc_count": 0
        },
        {
          "key": 1200000000,
          "doc_count": 1
        }
      ]
    }
  }
}
  1. change the interval value to 1000
curl -k -XGET -u "admin:$OPENSEARCH_INITIAL_ADMIN_PASSWORD"  \
 'https://localhost:9200/sample-index/_search' \
 -H 'Content-Type: application/json' \
 -d '{"size":0, "aggs": { "test": { "histogram": { "field": "some_value", "interval": 1000 }}}}'
  1. OpenSearch correctly responds with
{
  "error": {
    "root_cause": [],
    "type": "search_phase_execution_exception",
    "reason": "",
    "phase": "fetch",
    "grouped": true,
    "failed_shards": [],
    "caused_by": {
      "type": "too_many_buckets_exception",
      "reason": "Trying to create too many buckets. Must be less than or equal to: [65535] but was [1234568]. This limit can be set by changing the [search.max_buckets] cluster level setting.",
      "max_buckets": 65535
    }
  },
  "status": 503
}
  1. change the interval to 100
curl -k -XGET -u "admin:$OPENSEARCH_INITIAL_ADMIN_PASSWORD"  \
 'https://localhost:9200/sample-index/_search' \
 -H 'Content-Type: application/json' \
 -d '{"size":0, "aggs": { "test": { "histogram": { "field": "some_value", "interval": 100 }}}}'
  1. OpenSearch responds with something like curl: (56) OpenSSL SSL_read: error:0A000126:SSL routines::unexpected eof while reading, errno 0, because opensearch-node1 just died:
opensearch-node1         | [2024-06-26T12:12:51,906][INFO ][o.o.m.j.JvmGcMonitorService] [opensearch-node1] [gc][1318] overhead, spent [366ms] collecting in the last [1.1s]
opensearch-node1         | java.lang.OutOfMemoryError: Java heap space
opensearch-node1         | Dumping heap to data/java_pid1.hprof ...
opensearch-node1         | Unable to create data/java_pid1.hprof: File exists
opensearch-node1         | [2024-06-26T12:12:52,440][ERROR][o.o.b.OpenSearchUncaughtExceptionHandler] [opensearch-node1] fatal error in thread [opensearch[opensearch-node1][search][T#24]], exiting
opensearch-node1         | java.lang.OutOfMemoryError: Java heap space
opensearch-node1         | 	at java.base/java.util.Arrays.copyOf(Arrays.java:3482) ~[?:?]
opensearch-node1         | 	at java.base/java.util.ArrayList.grow(ArrayList.java:237) ~[?:?]
opensearch-node1         | 	at java.base/java.util.ArrayList.grow(ArrayList.java:244) ~[?:?]
opensearch-node1         | 	at java.base/java.util.ArrayList.add(ArrayList.java:515) ~[?:?]
opensearch-node1         | 	at java.base/java.util.ArrayList$ListItr.add(ArrayList.java:1150) ~[?:?]
opensearch-node1         | 	at org.opensearch.search.aggregations.bucket.histogram.InternalHistogram.addEmptyBuckets(InternalHistogram.java:416) ~[opensearch-2.15.0.jar:2.15.0]
opensearch-node1         | 	at org.opensearch.search.aggregations.bucket.histogram.InternalHistogram.reduce(InternalHistogram.java:436) ~[opensearch-2.15.0.jar:2.15.0]
opensearch-node1         | 	at org.opensearch.search.aggregations.InternalAggregations.reduce(InternalAggregations.java:290) ~[opensearch-2.15.0.jar:2.15.0]
opensearch-node1         | 	at org.opensearch.search.aggregations.InternalAggregations.topLevelReduce(InternalAggregations.java:225) ~[opensearch-2.15.0.jar:2.15.0]
opensearch-node1         | 	at org.opensearch.action.search.SearchPhaseController.reduceAggs(SearchPhaseController.java:557) ~[opensearch-2.15.0.jar:2.15.0]
opensearch-node1         | 	at org.opensearch.action.search.SearchPhaseController.reducedQueryPhase(SearchPhaseController.java:528) ~[opensearch-2.15.0.jar:2.15.0]
opensearch-node1         | 	at org.opensearch.action.search.QueryPhaseResultConsumer.reduce(QueryPhaseResultConsumer.java:153) ~[opensearch-2.15.0.jar:2.15.0]
opensearch-node1         | 	at org.opensearch.action.search.FetchSearchPhase.innerRun(FetchSearchPhase.java:136) ~[opensearch-2.15.0.jar:2.15.0]
opensearch-node1         | 	at org.opensearch.action.search.FetchSearchPhase$1.doRun(FetchSearchPhase.java:122) ~[opensearch-2.15.0.jar:2.15.0]
opensearch-node1         | 	at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) ~[opensearch-2.15.0.jar:2.15.0]
opensearch-node1         | 	at org.opensearch.threadpool.TaskAwareRunnable.doRun(TaskAwareRunnable.java:78) ~[opensearch-2.15.0.jar:2.15.0]
opensearch-node1         | 	at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) ~[opensearch-2.15.0.jar:2.15.0]
opensearch-node1         | 	at org.opensearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:59) ~[opensearch-2.15.0.jar:2.15.0]
opensearch-node1         | 	at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:941) ~[opensearch-2.15.0.jar:2.15.0]
opensearch-node1         | 	at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) ~[opensearch-2.15.0.jar:2.15.0]
opensearch-node1         | 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
opensearch-node1         | 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
opensearch-node1         | 	at java.base/java.lang.Thread.runWith(Thread.java:1596) ~[?:?]
opensearch-node1         | 	at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]
opensearch-node1         | fatal error in thread [opensearch[opensearch-node1][search][T#24]], exiting

Expected behavior

i would have expected (liked, if possible) to get the same too_many_buckets_exception

Additional Details

Plugins
n/a

Screenshots
n/a

Host/Environment (please complete the following information):

  • OS: Ubuntu
  • Version 22.04.4

Additional context

  • the version of the OpenSearch is 2.15.0, made no changes to the docker-compose.yml

Workarounds

  • adding "min_doc_count": 1 prevents the crash (and it returns 2 buckets, key: 0 and key: 1234567800); this expects that the clients will have to reconstruct the rest of the empty buckets themselves (not always possible for my particular case, sadly)
  • changing the heap from 512m to 1024m, for example, prevents the crash for "interval": 100", but it crashes for "interval": 10"

Metadata

Metadata

Assignees

Labels

Search:AggregationsbugSomething isn't workingenhancementEnhancement or improvement to existing feature or requestv2.16.0Issues and PRs related to version 2.16.0

Type

No type

Projects

Status

✅ Done

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions