Using data tiers allows allocating indices to dedicated tiers of nodes. Such nodes would typically have different characteristics, either physically (storage type, RAM:storage ratio) or from a usage standpoint (my hot tier is expected to respond fast).
Using data tiers is optional in that using the data role will assign all data tiers to the node. However, if a cluster is using separate data tiers it is desirable to be explicit about where a specific index belongs.
Today we allow index.routing.allocation.include._tier_preference to be unspecified for an index. This prevents Elasticsearch and its clients from relying on which tier an index/shard is located on, affecting following:
- Autoscaling does not know which data tier to scale up.
- The
_tier query will not know the tier of an index/shard.
Futhermore, it allows us to rely on this for future developments, such as balancing of shards, UI, monitoring and more. There is no known good use case for a tier-less index and allowing it only adds complexity for ourselves and users and can be considered bad data.
The proposal here is to work towards having index.routing.allocation.include._tier_preference be mandatory for all indices in following steps:
In a future release (possibly 9.0) we should close the loop and:
- Remove the flag from cluster settings.
- Enforce not setting tier preference to null (we could consider doing this in 8.0 too).
- Evaluate at what point we need/want to drop the
migrate_to_data_tiers API from the code (8.x? 9.x?)
Using data tiers allows allocating indices to dedicated tiers of nodes. Such nodes would typically have different characteristics, either physically (storage type, RAM:storage ratio) or from a usage standpoint (my hot tier is expected to respond fast).
Using data tiers is optional in that using the
datarole will assign all data tiers to the node. However, if a cluster is using separate data tiers it is desirable to be explicit about where a specific index belongs.Today we allow
index.routing.allocation.include._tier_preferenceto be unspecified for an index. This prevents Elasticsearch and its clients from relying on which tier an index/shard is located on, affecting following:_tierquery will not know the tier of an index/shard.Futhermore, it allows us to rely on this for future developments, such as balancing of shards, UI, monitoring and more. There is no known good use case for a tier-less index and allowing it only adds complexity for ourselves and users and can be considered bad data.
The proposal here is to work towards having
index.routing.allocation.include._tier_preferencebe mandatory for all indices in following steps:migrate_to_data_tiersAPI to apply the default data tier preference to any index that results in no tier preference otherwise and set the cluster setting mentioned in the first work item to ensure new indices are assigned a tier preference.In a future release (possibly 9.0) we should close the loop and:
migrate_to_data_tiersAPI from the code (8.x? 9.x?)