Skip to content

RFC: Append-only Indices #12886

@sarthakaggarwal97

Description

@sarthakaggarwal97

Is your feature request related to a problem? Please describe

OpenSearch today caters to various use cases like log analytics, full text search, metrics, observability, security events, etc. By default, any index created in OpenSearch allows updates and deletes on the documents ingested. While this is good to cater the various use cases mentioned above. There are time series based use cases such as logs, metrics, observability, security events, etc. which does not require update or delete operations. It is well known that updates and deletes are expensive operations as they require the OpenSearch to lookup and perform operations, add soft deletes, and consequently can cause additional work during merges which can hinder the overall performance which can be avoided by restricting those operations for those use cases which doesn’t have a need. Also, there are certain optimizations (listed below) that can be applied if we know the data will not be updated or deleted (at document level).

Disabling updates/deletes on the index documents can allow us to handle multiple use cases efficiently, they are:

  • Onboard Data Structures optimized for append only: Recently, an RFC was opened to support pre-compute data structures like Star Tree where any updates or deletes would be quite expensive in terms of compute to rebuild the star tree.
  • Support Security driven use cases: There have been requests from the users to support indices where documents should be immutable. Such requests fall within use-cases like audit logs, security logs, transactions, ledgers, etc. and the core requirement is to ensure the documents cannot be changed/altered.
  • Optimizing index settings: We can tune the merge policy to allow faster access on more recent data. We would also support bigger merge sizes of the segments (currently index.merge.policy.max_merged_segment defaults to 5gb). We would be avoiding a chunk of merges by preventing deletes and updates, and thus these huge segments will come in contention to be merged, allowing us to increase 5gb limit.

Describe the solution you'd like

We propose to introduce the concept of append-only indices in OpenSearch to support aforementioned use-cases. With the support for restriction around keeping documents immutable, we would deny any updates and deletes of the document. This will help on reducing the footprint around memory usage for indices (e.g. version map) and also unlock the avenues to enable optimizations and features in future based on this restriction e.g.

  1. We can support automated rollovers with append-only indices.
  2. With the future support of Writable Warm, we can enable auto-migration of shards/segments instead of keeping all the segments/shards hot on data nodes.

Implementation details: TBU based on community feedback.

Additional context

FAQs:

Q: How would it be different from data streams?
A: While Data Streams optimizes on the automated rollover of time series data, it still supports for all CRUD operations on the backing indices. With append-only mode we would aim to provide with specialized optimizations and security features to such indices as a core functionality.

Q: What would be the APIs/features that we will not allow for append-only indices?
A: Some initial thoughts on the APIs/features we may not be able to support are:

  1. We will not be supporting updates and deletes of the documents in the index
  2. _doc API will be denied to avoid document index with custom id, updates and deletes
  3. _split and _shrink APIs will be denied to avoid removal of documents from underlying shards of the source index.

Metadata

Metadata

Labels

IndexingIndexing, Bulk Indexing and anything related to indexingRoadmap:Cost/Performance/ScaleProject-wide roadmap labelenhancementEnhancement or improvement to existing feature or request

Type

No type

Projects

Status

New

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions