[RFC] Schema on Reads

### Problem Statement

By default, OpenSearch supports ‘schema on write’ i.e. the structure is defined at the time of ingest so that it is available for query immediately. However, as use cases for OpenSearch evolved, there is a need for greater flexibility. End users may not be aware of the data structure or may want additional attributes to query upon post ingest. This is where ‘schema on read’ is useful. With ‘schema on read’, the query result field can be defined at the time of query. This also helps greatly improve ingest rate by avoiding having to index fields that are not always going to be queried right away.

### Requirements

1. Ability to define fields that are evaluated at query time.
2. No changes should be made to the underlying schema. This avoids the need to re-index existing data.
3. These user defined fields should support all operations of a regular field in the query.

### Existing Solution

#### Scripting

Scripting is supported at various constructs of the _search request body. In each of these constructs, the fundamental working is same: script is evaluated at query time, it derives value/s from the indexed field/s and acts on the derived values. 

* In query and filter context, the derived value can be used to filter out documents. 
* In aggregations, results can be aggregated on the derived value. 
* The derived values can be exposed as a custom field by including it in script_fields. 
* Results can also be sorted on the derived value. 
* Using script_score, the derived value can be used to score the filtered documents.

### Shortcomings of existing solution

Scripting satisfies most of the requirements listed above but adding scripts to the request make it bulky, non-readable and difficult to manage. Even though scripts can be stored and referenced in the query, it does not help the readability.

Following example highlights the same:
```
GET index_1/_search
{
  "query": {
    "bool": {
      "filter": {
        "script": {
          "script": """
 return ChronoUnit.YEARS.between(doc['dob'].value, doc['create_time'].value) > 18;
 """
        }
      }
    }
  },
  "aggs": {
    "day-aggregations": {
      "histogram": {
        "interval": 10,
        "script": {
          "source": "ChronoUnit.YEARS.between(doc['dob'].value, doc['create_time'].value);"
        }
      }
    }
  },
  "sort": {
    "_script": {
      "type": "number",
      "script": {
        "source": "ChronoUnit.DAYS.between(doc['dob'].value, doc['create_time'].value);"
      },
      "order": "desc"
    }
  },
  "_source": true,
  "script_fields": {
    "age": {
      "script": "ChronoUnit.YEARS.between(doc['dob'].value, doc['create_time'].value);"
    }
  },
  "size": 10
}
```
### Proposed Solution

Regular OpenSearch queries revolve around fields in the schema. With scripting, the query syntax changes a lot.
In the proposed solution, we aim to achieve ease of using schema on read along with all the benefits of scripting.
The proposal includes defining fields in mapping which will be evaluated at query time and behave like regular fields.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Schema on Reads #1133

Problem Statement

Requirements

Existing Solution

Scripting

Shortcomings of existing solution

Proposed Solution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RFC] Schema on Reads #1133

Description

Problem Statement

Requirements

Existing Solution

Scripting

Shortcomings of existing solution

Proposed Solution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions