You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: _field-types/metadata-fields/source.md
+71-1Lines changed: 71 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -25,7 +25,7 @@ PUT sample-index1
25
25
```
26
26
{% include copy-curl.html %}
27
27
28
-
Disabling the `_source` field can impact the availability of certain features, such as the `update`, `update_by_query`, and `reindex` APIs, as well as the ability to debug queries or aggregations using the original indexed document.
28
+
Disabling the `_source` field can impact the availability of certain features, such as the `update`, `update_by_query`, and `reindex` APIs, as well as the ability to debug queries or aggregations using the original indexed document. To support these features without storing the `_source` field explicitly, [Derived source]({{site.url}}{{site.baseurl}}/field-types/metadata-fields/source/#derived-source) can be used without compromising storage constraints.
29
29
{: .warning}
30
30
31
31
## Including or excluding fields
@@ -52,3 +52,73 @@ PUT logs
52
52
{% include copy-curl.html %}
53
53
54
54
These fields are not stored in the `_source`, but you can still search them because the data remains indexed.
55
+
56
+
## Derived source
57
+
58
+
OpenSearch stores each ingested document in the `_source` field and also indexes individual fields for search. The `_source` field can consume significant storage space. To reduce storage use, you can configure OpenSearch to skip storing the `_source` field and instead reconstruct it dynamically when needed, for example, during `search`, `get`, `mget`, `reindex`, or `update` operations.
59
+
60
+
To enable derived source, configure the `derived_source` index-level setting:
61
+
62
+
63
+
```json
64
+
PUT sample-index1
65
+
{
66
+
"settings": {
67
+
"index": {
68
+
"derived_source": {
69
+
"enabled": true
70
+
}
71
+
}
72
+
}
73
+
}
74
+
```
75
+
{% include copy-curl.html %}
76
+
77
+
While skipping the `_source` field can significantly reduce storage requirements, dynamically deriving the source is generally slower than reading a stored `_source`. To avoid this overhead during search queries, do not request the `_source` field when it's not needed. You can do this by setting the `size` parameter, which controls the number of documents returned.
78
+
79
+
For real-time reads using the [Get Document API]({{site.url}}{{site.baseurl}}/api-reference/document-apis/get-documents/) or [Multi-get Documents API]({{site.url}}{{site.baseurl}}/api-reference/document-apis/multi-get/), which are served from the translog until [`refresh`]({{site.url}}{{site.baseurl}}/api-reference/index-apis/refresh/) happens, performance can be slower when using a derived source. This is because the document must first be ingested temporarily before the source can be reconstructed. You can avoid this additional latency by using an index-level `derived_source.translog` setting that disables generating a derived source during translog reads:
80
+
81
+
```json
82
+
PUT sample-index1
83
+
{
84
+
"settings": {
85
+
"index": {
86
+
"derived_source": {
87
+
"translog": {
88
+
"enabled": false
89
+
}
90
+
}
91
+
}
92
+
}
93
+
}
94
+
```
95
+
96
+
If this setting is used, you may notice differences in the `_source` content for a document depending on whether it is still in the translog or has been written to a segment.
97
+
98
+
### Supported fields and parameters
99
+
100
+
Derived source uses [`doc_values`]({{site.url}}{{site.baseurl}}/field-types/mapping-parameters/doc-values/) and [`stored_fields`]({{site.url}}{{site.baseurl}}/field-types/mapping-parameters/store/) to reconstruct the document at query time. Because of the implementation of `doc_values`, the dynamically generated `_source` may differ in format or precision from the original ingested document.
101
+
102
+
Derived source supports the following field types without requiring any changes to field mappings (with some [limitations](#limitations)):
For a [`text`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/text/) field with derived source enabled, the field value is stored as a stored field by default. You do not need to set the `store` mapping parameter to `true`.
117
+
{: .note}
118
+
119
+
### Limitations
120
+
121
+
Derived source does not support the following fields:
-[`keyword`]({{site.url}}{{site.baseurl}}/opensearch/supported-field-types/keyword/) and [`wildcard`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/wildcard/) fields that define either the [`ignore_above`]({{site.url}}{{site.baseurl}}/field-types/mapping-parameters/ignore-above/) or [`normalizer`]({{site.url}}{{site.baseurl}}/analyzers/normalizers/) parameters.
Copy file name to clipboardExpand all lines: _install-and-configure/configuring-opensearch/index-settings.md
+4Lines changed: 4 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -166,6 +166,8 @@ For `zstd`, `zstd_no_dict`, `qat_lz4`, and `qat_deflate`, you can specify the co
166
166
167
167
-`index.append_only.enabled` (Boolean): Set to `true` to prevent any updates to documents in the index. Default is `false`.
168
168
169
+
-`index.derived_source.enabled` (Boolean): Set to `true` to dynamically generate the source without explicitly storing the `_source` field, which can optimize storage. Default is `false`. For more information, see [Derived source]({{site.url}}{{site.baseurl}}/field-types/metadata-fields/source/#derived-source).
170
+
169
171
### Updating a static index setting
170
172
171
173
You can update a static index setting only on a closed index. The following example demonstrates updating the index codec setting.
@@ -269,6 +271,8 @@ OpenSearch supports the following dynamic index-level index settings:
269
271
270
272
-`index.routing.allocation.total_primary_shards_per_node` (Integer): The maximum number of primary shards from a single index that can be allocated to a single node. This setting is applicable only for remote-backed clusters. Default is `-1` (unlimited). Helps control per-index primary shard distribution across nodes by limiting the number of primary shards per node. Use with caution because primary shards from this index may remain unallocated if nodes reach their configured limits.
271
273
274
+
-`index.derived_source.translog.enabled` (Boolean): Controls how documents are read from the translog for an index with derived source enabled. Defaults to the `index.derived_source.enabled` value. For more information, see [Derived source]({{site.url}}{{site.baseurl}}/field-types/metadata-fields/source/#derived-source).
275
+
272
276
### Updating a dynamic index setting
273
277
274
278
You can update a dynamic index setting at any time through the API. For example, to update the refresh interval, use the following request:
0 commit comments