Skip to content

QL: Reduce number of nested bool queries #96236

@costin

Description

@costin

Description

Certain OR expressions are translated into a number of nested bool queries which, unless balanced, end up being too deep and rejected by ES/Lucene.

Reproduced on 8.7.1 thanks to @e-parth-pathak:

  1. Create an Index and ingest data:
PUT /library/_bulk?refresh
{"index":{"_id":"Leviathan Wakes"}}
{"name":"Leviathan Wakes","author":"James S.A. Corey","release_date":"2011-06-02","page_count":561}
{"index":{"_id":"Hyperion"}}
{"name":"Hyperion","author":"Dan Simmons","release_date":"1989-05-26","page_count":482}
{"index":{"_id":"Dune"}}
{"name":"Dune","author":"Frank Herbert","release_date":"1965-06-01","page_count":604}
{"index":{"_id":"Persuation"}}
{"name":"Persuation","author":"Robert Cialdini","release_date":"2023-06-01","page_count":304}
{"index":{"_id":"Sapiens"}}
{"name":"Sapiens","author":"Yuval Noah Harari","release_date":"1969-06-01","page_count":204}
{"index":{"_id":"Hooked"}}
{"name":"Hooked","author":"Nir Eyal","release_date":"2022-06-01","page_count":408}
  1. Run a query:
POST /_sql/translate
{
  "query": "SELECT * FROM library WHERE name like '%nir%' OR name like '%yuval%' OR name like '%dan%' OR name like '%frank%' OR name like '%robert%' OR name like '%leviathan%'"
}
  1. The response that I got:
{
  "size": 1000,
  "query": {
    "bool": {
      "should": [
        {
          "bool": {
            "should": [
              {
                "bool": {
                  "should": [
                    {
                      "bool": {
                        "should": [
                          {
                            "bool": {
                              "should": [
                                {
                                  "wildcard": {
                                    "name.keyword": {
                                      "wildcard": "*nir*",
                                      "boost": 1
                                    }
                                  }
                                },
                                {
                                  "wildcard": {
                                    "name.keyword": {
                                      "wildcard": "*yuval*",
                                      "boost": 1
                                    }
                                  }
                                }
                              ],
                              "boost": 1
                            }
                          },
                          {
                            "wildcard": {
                              "name.keyword": {
                                "wildcard": "*dan*",
                                "boost": 1
                              }
                            }
                          }
                        ],
                        "boost": 1
                      }
                    },
                    {
                      "wildcard": {
                        "name.keyword": {
                          "wildcard": "*frank*",
                          "boost": 1
                        }
                      }
                    }
                  ],
                  "boost": 1
                }
              },
              {
                "wildcard": {
                  "name.keyword": {
                    "wildcard": "*robert*",
                    "boost": 1
                  }
                }
              }
            ],
            "boost": 1
          }
        },
        {
          "wildcard": {
            "name.keyword": {
              "wildcard": "*leviathan*",
              "boost": 1
            }
          }
        }
      ],
      "boost": 1
    }
  },
  "_source": false,
  "fields": [
    {
      "field": "author"
    },
    {
      "field": "name"
    },
    {
      "field": "page_count"
    },
    {
      "field": "release_date",
      "format": "strict_date_optional_time_nanos"
    }
  ],
  "sort": [
    {
      "_doc": {
        "order": "asc"
      }
    }
  ],
  "track_total_hits": -1
}

Note that while the lowest nested bool query uses two entries, the rest do not and instead rely on another bool wrapper.
Either all entries should be put into a should clause OR, all the OR entries need to be balanced and be wrapped on the same level.

P.S. This has been addressed in the past by #30267 however it still seems to occur when dealing with like comparisons.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions