[ML] Copy more settings when creating DF analytics destination index by edsavage · Pull Request #91546 · elastic/elasticsearch

edsavage · 2022-11-14T11:59:02Z

Currently, when a data frame analytics job is created, just two settings from the source index are copied to the auto-created destination index - index.number_of_shards and index.number_of_replicas.

To cater for slightly more complex source indices this PR makes changes to also copy/merge additional settings from the source indices to the destination index - index.analysis, index.similarity and index.mapping.

In the case of the index.mapping settings, when multiple source indices are involved, the settings are merged in a similar manner as for index.number_of_shards & index.number_of_replicas, i.e. by taking the maximum value of the setting across all source indices.

For index.similarity, when merging multiple indices, the similarity objects must be identical else an exception is thrown.

index.analysis is comprised of the sub-objects index.analysis.filter and index.analysis.analyzer, which may in turn be comprised of multiple filter and analyzer objects. The merge procedure here is to throw an exception if identically named objects differ in content, else all filter and analyzer objects are copied over to the destination index.

Fixes #89795

Currently, when a data frame analytics job is created, just two settings from the source index are copied to the auto-created destination index - index.number_of_shards and index.number_of_replicas. To cater for slightly more complex source indices this PR makes changes to also copy/merge additional settings from the source indices to the destination index - index.analysis, index.similarity and index.mapping. In the case of the index.mapping settings, when multiple source indices are involved, the settings are merged in a similar manner as for index.number_of_shards & index.number_of_replicas, i.e. by taking the maximum value of the setting across all source indices. For index.similarity, when merging multiple indices, the similarity objects must be identical else an exception is thrown. index.analysis is comprised of sub-objects index.analysis.filter and index.analysis.analyzer, which may in turn be comprised of multiple filter and analyzer objects. The merge procedure here is to throw an exception if identically named objects differ in content, else all filter and analyzer objects are copied over to the destination index.

elasticsearchmachine · 2022-11-14T11:59:26Z

Pinging @elastic/ml-core (Team:ML)

elasticsearchmachine · 2022-11-14T11:59:26Z

Hi @edsavage, I've created a changelog YAML for you.

…icsearch into transforms_merge_settings

dimitris-athanasiou

LGTM

* main: (163 commits) [DOCS] Edits frequent items aggregation (elastic#91564) Handle providers of optional services in ubermodule classloader (elastic#91217) Add `exportDockerImages` lifecycle task for exporting docker tarballs (elastic#91571) Fix CSV dependency report output file location in DRA CI job Fix variable placeholder for Strings.format calls (elastic#91531) Fix output dir creation in ConcatFileTask (elastic#91568) Fix declaration of dependencies in DRA snapshots CI job (elastic#91569) Upgrade Gradle Enterprise plugin to 3.11.4 (elastic#91435) Ingest DateProcessor (small) speedup, optimize collections code in DateFormatter.forPattern (elastic#91521) Fix inter project handling of generateDependenciesReport (elastic#91555) [Synthetics] Add synthetics-* read to fleet-server (elastic#91391) [ML] Copy more settings when creating DF analytics destination index (elastic#91546) Reduce CartesianCentroidIT flakiness (elastic#91553) Propagate last node to reinitialized routing tables (elastic#91549) Forecast write load during rollovers (elastic#91425) [DOCS] Warn about potential overhead of named queries (elastic#91512) Datastream unavailable exception metadata (elastic#91461) Generate docker images and dependency report in DRA ci job (elastic#91545) Support cartesian_bounds aggregation on point and shape (elastic#91298) Add support for EQL samples queries (elastic#91312) ... # Conflicts: # x-pack/plugin/rollup/src/main/java/org/elasticsearch/xpack/downsample/RollupShardIndexer.java

edsavage added >bug :ml Machine learning v8.6.0 labels Nov 14, 2022

elasticsearchmachine added the Team:ML Meta label for the ML team label Nov 14, 2022

edsavage added 3 commits November 14, 2022 11:59

Update docs/changelog/91546.yaml

97c91cb

Tend to formatting

0c2de07

Merge branch 'transforms_merge_settings' of github.com:edsavage/elast…

d56c168

…icsearch into transforms_merge_settings

dimitris-athanasiou approved these changes Nov 14, 2022

View reviewed changes

edsavage merged commit fc5c1f1 into elastic:main Nov 14, 2022

edsavage deleted the transforms_merge_settings branch November 14, 2022 15:27

dimitris-athanasiou mentioned this pull request Dec 20, 2022

[ML] Allow user settings for data frame analytics destination index #68514

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Copy more settings when creating DF analytics destination index#91546

[ML] Copy more settings when creating DF analytics destination index#91546
edsavage merged 4 commits intoelastic:mainfrom
edsavage:transforms_merge_settings

edsavage commented Nov 14, 2022

Uh oh!

elasticsearchmachine commented Nov 14, 2022

Uh oh!

elasticsearchmachine commented Nov 14, 2022

Uh oh!

dimitris-athanasiou left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

edsavage commented Nov 14, 2022

Uh oh!

elasticsearchmachine commented Nov 14, 2022

Uh oh!

elasticsearchmachine commented Nov 14, 2022

Uh oh!

dimitris-athanasiou left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants