[Feature Request] Aggregation `include`/`exclude` should support faster filtering on prefixes

### Is your feature request related to a problem? Please describe

I've been working with an ecommerce user of OpenSearch, where they've implemented something like Lucene's hierarchical facets by tagging products with `category:label` pairs (like `color:red`). These are collected using `terms` aggregations.

To avoid collecting facet labels that are irrelevant to the current search, they have some mechanism to identify the relevant facet categories for the query. They've been filtering these using a regular expression in the `include` parameter for their `terms` aggregation, like `"include": "(color|size|brand|material|machine\\ washable|sleeve\\ length|...):.+"`, in order to filter the `category:label` pairs based on a specific set of categories. Overall, prefix filtering on terms aggregations seems like a fairly reasonable thing to want to do.

Unfortunately, this really slows down search requests, as the `IncludeExclude` class tries to step through all possible `category:label` values (in the global ordinals) that match the expression. The commit from 2015 that added the current automaton-based behavior (which was a significant speedup from what came before) mentions this problem in the [commit message](https://github.com/opensearch-project/OpenSearch/commit/aecd9ac51527fa6ac57788c23527ec4e0dc32636):
> Today we check every regular expression eagerly against every possible term.
This can be very slow if you have lots of unique terms, and even the bottleneck
if your query is selective. 

Indeed, there's a `TODO` in place calling out that prefix matching should be a special case: https://github.com/opensearch-project/OpenSearch/blob/cf2c31fffe844f78f17cf1c2a780198b9b6258d4/server/src/main/java/org/opensearch/search/aggregations/bucket/terms/IncludeExclude.java#L344
https://github.com/msfroh/OpenSearch/blob/cf2c31fffe844f78f17cf1c2a780198b9b6258d4/server/src/main/java/org/opensearch/search/aggregations/bucket/terms/IncludeExclude.java#L344

### Describe the solution you'd like

I would like to specialize the `IncludeExclude.AutomatonBackedOrdinalsFilter` to handle prefixes better.

Ideally, I would like to make it fully transparent to the user -- essentially, I'd like to address the `TODO` that I listed above, where we simply handle prefixes as a special case of regexp `include`/`exclude`. 

In order to do that, I need to learn more about Lucene's automaton matching, which is something that I would like to wrap my head around anyway. (I learned a little bit as a result of https://github.com/opensearch-project/OpenSearch/pull/13461, but that only scratched the surface. I want to know more.)

### Related component

Search:Aggregations

### Describe alternatives you've considered

If the Lucene automaton rabbit hole turns out to be too dark and scary, another option could be to add `include_prefixes` and `exclude_prefixes` parameters that we can parse from the `IncludeExclude` class. 

Once we have that, we could implement logic similar to https://github.com/opensearch-project/OpenSearch/blob/cf2c31fffe844f78f17cf1c2a780198b9b6258d4/server/src/main/java/org/opensearch/search/aggregations/bucket/terms/IncludeExclude.java#L340-L350

The new logic would be something like:

```java
static class PrefixBackedOrdinalsFilter extends OrdinalsFilter {

  private final SortedSet<BytesRef> include, exclude;

  private PrefixBackedOrdinalsFilter(SortedSet<BytesRef> include, SortedSet<BytesRef> exclude) {
    this.include = include;
    this.exclude = exclude;
  }

  private static BytesRef nextBytesRef(BytesRef bytesRef) {
    BytesRef next = BytesRef.deepCopyOf(bytesRef);
    int pos = next.bytes.length - 1;
    while (pos >= 0 && next.bytes[pos] == 255) {
      next.bytes[pos] = 0;
      pos--;
    }
    if (pos >= 0) {
      next.bytes[pos]++;
    } else {
      // Every byte in our prefix had value 255. We must match all subsequent ordinals.
      return null;
    }
    return next;
  }

  @Override
  public LongBitSet acceptedGlobalOrdinals(SortedSetDocValues globalOrdinals) throws IOException {
    LongBitSet accept = new LongBitSet(globalOrdinals.getValueCount());
    for (BytesRef inc : includes) {
      int startOrd = globalOrdinals.lookupTerm(inc);
      if (startOrd < 0) {
        startOrd = -1 - startOrd;
      }
      if (startOrd >= accept.length()) {
        continue;
      }
      BytesRef next = nextBytesRef(inc);
      if (next == null) {
        accept.set(startOrd, accept.length());
      } else {
        endOrd = globalOrdinals.lookupTerm(next);
        if (endOrd < 0) {
          endOrd = -1 - endOrd;
        }
        if (startOrd < endOrd) {
          accept.set(startOrd, endOrd);
        }
      }
    }
    for (BytesRef exc : excludes) {
      // Same logic as above, but we use `accept.clear(...)` to unset bits.
    }
  }
}
```

### Additional context

_No response_

	public LongBitSet acceptedGlobalOrdinals(SortedSetDocValues globalOrdinals) throws IOException {
	LongBitSet acceptedGlobalOrdinals = new LongBitSet(globalOrdinals.getValueCount());
	TermsEnum globalTermsEnum;
	Terms globalTerms = new DocValuesTerms(globalOrdinals);
	// TODO: specialize based on compiled.type: for ALL and prefixes (sinkState >= 0 ) we can avoid i/o and just set bits.
	globalTermsEnum = compiled.getTermsEnum(globalTerms);
	for (BytesRef term = globalTermsEnum.next(); term != null; term = globalTermsEnum.next()) {
	acceptedGlobalOrdinals.set(globalTermsEnum.ord());
	}
	return acceptedGlobalOrdinals;
	}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Aggregation `include`/`exclude` should support faster filtering on prefixes #14368

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Related component

Describe alternatives you've considered

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature Request] Aggregation include/exclude should support faster filtering on prefixes #14368

Description

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Related component

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[Feature Request] Aggregation `include`/`exclude` should support faster filtering on prefixes #14368