-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[Feature Proposal] Writable Remote Index #7804
Description
This feature proposal is WIP. We will continue to add details to Sections that are marked with ToDo.
Goal
As an extension to remote store feature, searchable remote index will introduce data tier support in OpenSearch. Hot index has data in local disk as well as remote store whereas warm index has data only in the remote store. The next step is writable warm index. This RFC talks about the requirement of writable warm, different approaches to support writes, pros/cons of each of the approaches and recommends an approach.
Background
This doc assumes following index structure with data tiers. Example provided is just to highlight sample pattern and can be changed as per user’s requirements
orders- Live index, normal writes go to this index.order-history-<DATE>- orders index is rotated on a daily basis and rotated index is suffixed with the date.orders-aliaspoints to indexes containing last 30 days of data.ordersis added to this alias withis_write_index=true. That means, if we use alias to write data, it will always write toordersindex.- Last 7 days of data is kept in the hot tier. That means indexes between
order-history-2023-02-22toorder-history-2023-02-16are hot indexes and can be written in the same way we write data to an index today. - Data that is 7 to 30 days old will be removed from local nodes, index metadata is still part of the cluster state. This becomes part of warm tier. In this example, indexes between
order-history-2023-02-15toorder-history-2023-01-16are warm indexes.
Requirements
Functional
- Support updates to existing documents without any changes at client side
- Support append data to warm index
- Optimised append-only writes based on auto-generated ids/data streams
- Refresh data post writes after a configurable period or based on explicitly defined policies
Non-Functional
- Shouldn’t interfere with read performance
- Impact on write latency should be predictable and/or configurable
- Time required to make new changes visible should be configurable
- Minimal storage overhead in append/updates
Non-Requirements
- Using the same index name (or alias) to write to hot/warm index.
- In phase 1, user needs to provide the exact index to write data to. For example, writing to warm index
order-history-2023-02-22would need the same index name to be provided. Writing to alias will only write to live hot index. - In next phase, we can support writing to a single index (
ordersalias as per the example above). Based on a configured field (liketimestamp), OpenSearch decides which index to write the data to. Even though this is valid requirement, this can be built incrementally.
- In phase 1, user needs to provide the exact index to write data to. For example, writing to warm index
Use Cases
Write New Data
Add new documents to the existing warm index. This use case is mostly driven by back-filling data that was not ingested earlier due to some reason. This assumes that user knows which index to use for writing the new data.
Update Existing Data
To update existing data, we need to fetch the existing document first. To improve the latency we need to perform block-level fetches. Once the document is fetched and new changes are applied to it, the next step would be same as Write New Data
Potential Approaches
These approaches provide solution for Write New Data use case only as Update Existing Data use case internally depends on write new data.
[Recommended]
Once the request to write hits the warm index, we open the engine in read-write mode, with the metadata from local disk. We can potentially have warm index have engine open in read-write mode from the start to support writes.
For non-append-only cases we do a block fetch of the document that needs to be updated. Then perform an update of the document, by writing to remote translog before we ack back.
For append-only uses cases, we can skip the block fetch part altogether since we know its a new document and write directly to remote translog. Based on configurable delay we refresh the segments and move the newly created segments and updated bitsets to remote segment store. More details of this approach will be covered in the design review.
Alternative Approaches
Download All Data
In this approach, we make the index hot by downloading all data from remote store to local disk. Once data is downloaded, new data is ingested into it. As this is warm index, we can’t keep the data forever on the local disk. We wait for X mins after last write to avoid frequent downloading of the data then flush and delete data (and metadata based on the data tier type type) from local disk.
Comparison
Potential Issues
- Both of the above approaches can result in too many small segments. This will impact query performance. Even with concurrent segment search, if the number of segments is high, it would impact overall performance. We need a way to limit number of segments with the help of background segment merger.
- Time to make documents visible will increase (it would not be same as refresh_interval of an index)
Next Steps
- POC to check feasibility of using
RemoteDirectoryinstead ofFSDirectoryinIndexShard.Store - Once concurrent segment search is introduced, we need to understand impact of 1/5/10/100 segments on search and overall node performance (CPU, JVM etc.)
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Status