Skip to content

[Feature Request] Support for object type in Derived Fields #13143

@rishabhmaurya

Description

@rishabhmaurya

Is your feature request related to a problem? Please describe

Issue: #12281

With the current implementation of derived fields in OpenSearch, we support various primitive types such as keyword, long, double, geo_point, ip, date, and boolean. However, there are scenarios where users may need to derive fields of type object based on source fields containing JSON data. This proposal aims to enhance derived fields to support nested JSON structures, enabling users to query subfields within derived JSON fields.

Current Scenario
Consider the following index mapping definition:

{
  "mappings": {
    "properties": {
      "product_name": { "type": "keyword" },
      "product_json": { "type": "text", "index": false }
    },
    "derived": {
      "derived_product_json": {
        "type": "keyword",
        "script": {
          "source": "emit(params._source[\"product_json\"])"
        }
      }
    }
  }
}

In this example, product_json contains a JSON object, and derived_product_json derives a field of type keyword representing the entire JSON object. However, querying specific subfields within derived_product_json is currently not possible since subfields are not defined within the derived field context.

Describe the solution you'd like

Introduce a new field type called json or object within the derived fields context to support nested JSON structures. This enhancement will enable users to query subfields of derived JSON fields, providing greater flexibility in data querying and analysis.

Proposed Mapping

{
  "mappings": {
    "properties": {
      "product_name": { "type": "keyword" },
      "product_json": { "type": "text", "index": false }
    },
    "derived": {
      "derived_product_json": {
        "type": "json", // New type introduced: json or object
        "script": {
          "source": "emit(params._source[\"product_json\"])"
        }
      }
    }
  }
}

Example Document
Consider a document with the following structure:

{
  "product_name": "canyon ultimate road bike",
  "product_json": { 
    "brand": "canyon", 
    "model": "ultimate" ,
    "price": 4500
  }
}

Querying Subfields
With the proposed enhancement, users can query subfields within derived_product_json. For instance:

{
  "query": {
    "bool": {
      "must": [
        { "match": { "derived_product_json.brand": "canyon" } },
        { "match": { "derived_product_json.model": "ultimate" } }
      ]
    }
  },
  "fields" : ["derived_product_json.brand, derived_product_json.model"]
}

This query retrieves documents where the derived product_json field contains specific brand and model values.

Related component

Search:Performance

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

Status

✅ Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions