Skip to content

Commit 7423a45

Browse files
tanik98kolchfa-awsnatebower
authored andcommitted
Add documentation for derived source in source field metadata (opensearch-project#10674)
* Add documentation for derived source in source field metadata Signed-off-by: Tanik Pansuriya <panbhai@amazon.com> * Doc review Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Update the derived source documentation and add settings in index settings page Signed-off-by: Tanik Pansuriya <panbhai@amazon.com> * Apply suggestions from code review Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Apply suggestions from code review Signed-off-by: Nathan Bower <nbower@amazon.com> --------- Signed-off-by: Tanik Pansuriya <panbhai@amazon.com> Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Signed-off-by: Nathan Bower <nbower@amazon.com> Co-authored-by: Fanit Kolchina <kolchfa@amazon.com> Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Nathan Bower <nbower@amazon.com>
1 parent de5d8c8 commit 7423a45

2 files changed

Lines changed: 75 additions & 1 deletion

File tree

_field-types/metadata-fields/source.md

Lines changed: 71 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ PUT sample-index1
2525
```
2626
{% include copy-curl.html %}
2727

28-
Disabling the `_source` field can impact the availability of certain features, such as the `update`, `update_by_query`, and `reindex` APIs, as well as the ability to debug queries or aggregations using the original indexed document.
28+
Disabling the `_source` field can impact the availability of certain features, such as the `update`, `update_by_query`, and `reindex` APIs, as well as the ability to debug queries or aggregations using the original indexed document. To support these features without storing the `_source` field explicitly, [Derived source]({{site.url}}{{site.baseurl}}/field-types/metadata-fields/source/#derived-source) can be used without compromising storage constraints.
2929
{: .warning}
3030

3131
## Including or excluding fields
@@ -52,3 +52,73 @@ PUT logs
5252
{% include copy-curl.html %}
5353

5454
These fields are not stored in the `_source`, but you can still search them because the data remains indexed.
55+
56+
## Derived source
57+
58+
OpenSearch stores each ingested document in the `_source` field and also indexes individual fields for search. The `_source` field can consume significant storage space. To reduce storage use, you can configure OpenSearch to skip storing the `_source` field and instead reconstruct it dynamically when needed, for example, during `search`, `get`, `mget`, `reindex`, or `update` operations.
59+
60+
To enable derived source, configure the `derived_source` index-level setting:
61+
62+
63+
```json
64+
PUT sample-index1
65+
{
66+
"settings": {
67+
"index": {
68+
"derived_source": {
69+
"enabled": true
70+
}
71+
}
72+
}
73+
}
74+
```
75+
{% include copy-curl.html %}
76+
77+
While skipping the `_source` field can significantly reduce storage requirements, dynamically deriving the source is generally slower than reading a stored `_source`. To avoid this overhead during search queries, do not request the `_source` field when it's not needed. You can do this by setting the `size` parameter, which controls the number of documents returned.
78+
79+
For real-time reads using the [Get Document API]({{site.url}}{{site.baseurl}}/api-reference/document-apis/get-documents/) or [Multi-get Documents API]({{site.url}}{{site.baseurl}}/api-reference/document-apis/multi-get/), which are served from the translog until [`refresh`]({{site.url}}{{site.baseurl}}/api-reference/index-apis/refresh/) happens, performance can be slower when using a derived source. This is because the document must first be ingested temporarily before the source can be reconstructed. You can avoid this additional latency by using an index-level `derived_source.translog` setting that disables generating a derived source during translog reads:
80+
81+
```json
82+
PUT sample-index1
83+
{
84+
"settings": {
85+
"index": {
86+
"derived_source": {
87+
"translog": {
88+
"enabled": false
89+
}
90+
}
91+
}
92+
}
93+
}
94+
```
95+
96+
If this setting is used, you may notice differences in the `_source` content for a document depending on whether it is still in the translog or has been written to a segment.
97+
98+
### Supported fields and parameters
99+
100+
Derived source uses [`doc_values`]({{site.url}}{{site.baseurl}}/field-types/mapping-parameters/doc-values/) and [`stored_fields`]({{site.url}}{{site.baseurl}}/field-types/mapping-parameters/store/) to reconstruct the document at query time. Because of the implementation of `doc_values`, the dynamically generated `_source` may differ in format or precision from the original ingested document.
101+
102+
Derived source supports the following field types without requiring any changes to field mappings (with some [limitations](#limitations)):
103+
104+
- [`boolean`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/boolean/)
105+
- [`byte`, `double`, `float`, `half_float`, `integer`, `long`, `short`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/numeric/)
106+
- [`date`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/date/)
107+
- [`date-nanos`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/date-nanos/)
108+
- [`geo_point`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/geo-point/)
109+
- [`ip`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/ip/)
110+
- [`keyword`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/keyword/)
111+
- [`unsigned_long`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/unsigned-long/)
112+
- [`scaled_float`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/numeric/)
113+
- [`text`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/text/)
114+
- [`wildcard`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/wildcard/)
115+
116+
For a [`text`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/text/) field with derived source enabled, the field value is stored as a stored field by default. You do not need to set the `store` mapping parameter to `true`.
117+
{: .note}
118+
119+
### Limitations
120+
121+
Derived source does not support the following fields:
122+
123+
- Fields containing [`copy_to`]({{site.url}}{{site.baseurl}}/field-types/mapping-parameters/copy-to/) parameters.
124+
- [`keyword`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/keyword/) and [`wildcard`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/wildcard/) fields that define either the [`ignore_above`]({{site.url}}{{site.baseurl}}/field-types/mapping-parameters/ignore-above/) or [`normalizer`]({{site.url}}{{site.baseurl}}/analyzers/normalizers/) parameters.

_install-and-configure/configuring-opensearch/index-settings.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -166,6 +166,8 @@ For `zstd`, `zstd_no_dict`, `qat_lz4`, and `qat_deflate`, you can specify the co
166166

167167
- `index.append_only.enabled` (Boolean): Set to `true` to prevent any updates to documents in the index. Default is `false`.
168168

169+
- `index.derived_source.enabled` (Boolean): Set to `true` to dynamically generate the source without explicitly storing the `_source` field, which can optimize storage. Default is `false`. For more information, see [Derived source]({{site.url}}{{site.baseurl}}/field-types/metadata-fields/source/#derived-source).
170+
169171
### Updating a static index setting
170172

171173
You can update a static index setting only on a closed index. The following example demonstrates updating the index codec setting.
@@ -269,6 +271,8 @@ OpenSearch supports the following dynamic index-level index settings:
269271

270272
- `index.routing.allocation.total_primary_shards_per_node` (Integer): The maximum number of primary shards from a single index that can be allocated to a single node. This setting is applicable only for remote-backed clusters. Default is `-1` (unlimited). Helps control per-index primary shard distribution across nodes by limiting the number of primary shards per node. Use with caution because primary shards from this index may remain unallocated if nodes reach their configured limits.
271273

274+
- `index.derived_source.translog.enabled` (Boolean): Controls how documents are read from the translog for an index with derived source enabled. Defaults to the `index.derived_source.enabled` value. For more information, see [Derived source]({{site.url}}{{site.baseurl}}/field-types/metadata-fields/source/#derived-source).
275+
272276
### Updating a dynamic index setting
273277

274278
You can update a dynamic index setting at any time through the API. For example, to update the refresh interval, use the following request:

0 commit comments

Comments
 (0)