[Feature Request] Introduce a New Field Mapper for HyperLogLog++ Sketches



### Is your feature request related to a problem? Please describe.

OpenSearch currently lacks a native field type for storing and aggregating pre-computed **HyperLogLog++ (HLL++) sketches**. The existing `cardinality` aggregation is excellent for calculating unique counts on raw data, but it cannot be used on data that has already been aggregated, as the final numeric count is not re-aggregatable.

This presents two major problems:

1.  **Inefficient Data Ingestion:** Users with data pipelines (Spark, Flink, etc.) that can pre-compute HLL++ sketches to reduce data volume have no efficient way to use them in OpenSearch. The only option is to use a `binary` field, which prevents any server-side aggregation and forces all merge logic to the client side.
2.  **Blocker for Multi-Tier Rollups:** This is the most critical issue. The inability to re-aggregate unique counts is the primary reason that multi-tier rollups are not safely supported. Users cannot create a rollup with a 1-minute unique user count and then re-aggregate that into a correct 1-hour unique user count.



### Describe the solution you'd like

I propose the creation of a new field type, tentatively named `hll_sketch`. This feature would consist of two main components:

1.  **A `HLLSketchFieldMapper`:** A new field mapper that accepts an HLL++ sketch (e.g., as a Base64 encoded string), stores it efficiently as a binary doc value, and makes it available for aggregation.
2.  **A `merge_hll_sketches` Aggregation:** A new bucket aggregation that can operate on `hll_sketch` fields. This aggregation would collect the sketches from the relevant documents, merge them into a single sketch, and return the final cardinality.
---

### **How This Unlocks Multi-Tier Rollups**

This new field type is the **foundational building block** for enabling safe, accurate, multi-tier rollups for cardinality metrics.

The workflow would be as follows:

1.  **Initial Rollup (Tier 1):** An initial rollup job would run on the raw data. Instead of calculating the final `cardinality`, it would generate and store the raw HLL++ sketch in a field mapped as `hll_sketch`.
2.  **Subsequent Rollup (Tier 2):** A second rollup job could then safely target the Tier 1 rollup index. It would use the new `merge_hll_sketches` aggregation on the `hll_sketch` field to accurately combine the sketches from the first tier into a new, higher-level sketch or final count.

This solves the limitation of re-aggregating final counts by instead merging the underlying data structures, making tiered data retention strategies a native capability.



### Example Workflow

**1. Define the Mapping**
```json
PUT my-analytics-index
{
  "mappings": {
    "properties": {
      "timestamp": { "type": "date" },
      "user_id_sketch": {
        "type": "hll_sketch" 
      }
    }
  }
}
```

**2. Ingest a Pre-Computed Sketch**
```json
POST my-analytics-index/_doc
{
  "timestamp": "2025-09-30T14:30:00Z",
  "user_id_sketch": "AAEGEAgaA...base64-encoded-hll-sketch...AgA="
}
```

### Related component

Search:Aggregations

### Describe alternatives you've considered

_No response_

### Additional context

https://github.com/opensearch-project/index-management/issues/1493
https://github.com/opensearch-project/index-management/issues/1490

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Introduce a New Field Mapper for HyperLogLog++ Sketches #19487

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

How This Unlocks Multi-Tier Rollups

Example Workflow

Related component

Describe alternatives you've considered

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature Request] Introduce a New Field Mapper for HyperLogLog++ Sketches #19487

Description

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

How This Unlocks Multi-Tier Rollups

Example Workflow

Related component

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions