Skip to content

[RFC] Offline Background Tasks #12361

@linuxpi

Description

@linuxpi

[Detailed Design Proposal] #13554

Introduction

Opensearch process running with data role, has responsibilities to execute various Background Tasks apart from indexing & search, some of these are:

  • Segment Merges
  • Force Merges
  • Re-indexing
  • Remote Garbage Collection
  • Shard Split/Shrink
  • Snapshots etc.

These tasks are a crucial part of an Opensearch Cluster. For example, Segment Merges ensure indices are in an optimal state. As an index grows and data is constantly added or updated, these segments need to be periodically merged to maintain efficient search performance and minimum storage footprint.

This is even more important for indices where data ingestion is sparse over time leading to high number of small-small segments. Segments Merges combines these segments into larger ones ensuring better overall index performance.

Similarly, each background task has its own importance.

Is your feature request related to a problem? Please describe

While being a crucial part of Opensearch, these Tasks consume some resources, taking a toll on the process which is supposed to deliver predictable and consistent indexing and search throughput. For ex: Segment merges is an important, frequent and heavy operation which demands a good chunk of available resources. Force Merging to lesser no of segment is an even heavier toll.

Apart from that, the configured resources on the node might not be sufficient to perform these operations along with incoming traffic, in a expected timeframe, which leads to timeouts/failures, eventually delays to background operations.
Apart from that any failures/bugs in these background operations tampers with core operations.

Describe the solution you'd like

Allow users the ability to segregate such operations to separate/dedicated node(s), it helps them scale indexing/search performance predictably without having to compete for resources with background tasks. Similarly background tasks won't be impacted by any surge in core operations traffic.

With introduction of Remote Store, offloading background operations makes even more sense as data is separated out in Remote Store and efficient to interact with, from a separate/dedicated node.

Proposal is to introduce a separate fleet of Nodes(Offline Fleet) to execute all background tasks. This ensures full segregation from core operations and allows users to independently scale this fleet based on the pending background tasks.

achitecture_queue

To begin with, we can target Segment Merges or Force Merges and allow Remote Store Clusters the ability to separate out merges. Later we can extend it to other background tasks and even think about how to extend the functionality for non Remote Store clusters.

Here is high level view of how the flow looks like with Offline Fleet for a Cluster.

HighLevelSequenceDiagram

The Added Cost

Not all the users would want to spin up separate nodes for background operations, so however we choose to implement/execute this, we would ensure status quo is maintained.

There is obviously an added cost of the Offline Fleet, which would be directly dependent on the no of nodes provisioned in the Offline Fleet.

Apart from that, with Offline fleet, there would be 2 additional downloads. Consider Segment Merges:

  1. Offline Fleet Node would have to download the Segments to be Merged, today since the segments are already present in local, there is no download needed.
  2. Once the Merged Segments are uploaded to Remote Store, the data node with corresponding Shard would download those merged segments

In future, we could also support a hybrid model where light weight Tasks could be run locally on Data Nodes while others could be offloaded to Background Fleet.

As we progress, I plan on adding more details to the individual components involved and how they interact with each other and existing component.

Related component

Storage

Describe alternatives you've considered

Apart from the approach mentioned above, another option would be isolation of resources on the data node itself for core(indexing/search) operations and other adhoc operations like merges and snapshots. This would have less friction from users in adoption as they don’t have to provision a separate fleet. But it has some caveats which doesn't make it much appealing:

  • We wouldn’t be able to independently scale resources for merges without affecting core operations.
  • Reserving resources for adhoc operations on the data node might not be optimal as all the nodes will not have merges to be performed all the time. Instead pooling all the merge operations from all nodes together into dedicated nodes would give better utilization of dedicated resources.
  • Complete Isolation of resources on the same node is not be as trivial to solve.

Additional context

No response

Metadata

Metadata

Assignees

Labels

RFCIssues requesting major changesRoadmap:Cost/Performance/ScaleProject-wide roadmap labelStorageIssues and PRs relating to data and metadata storageenhancementEnhancement or improvement to existing feature or requestmerges

Type

No type

Projects

Status

🏗 In progress

Status

New

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions