[RFC] Modeling writes as an extensible workflow

> **Note 1** - The discussion in this issue focuses on the _write_ paths in OpenSearch, though the assertions herein are probably true for other parts of the codebase.

> **Note 2** - For clarity, the term “replication” will only be used to describe code paths that result in segment files being created on a shard. The act of sending requests from the primary shard to replicas will be termed “forwarding”.

## What's the problem?

Much of the code in the write path for segments is tightly coupled together. For example:

1. The implementation of _index_ and _delete_ operations uses a class hierarchy that _mandates_ that these operations be replicated.
2. The notion of “replication” is coupled to “forwarding” (see Note 2 above)
3. The notion of a “write” is tightly coupled with a translog update.

(I'll talk about _why_ we would want to solve for these in a moment.)

This coupling is driven by the code architecture. In technical terms, I'd say it uses a compile-time, top-down (inheritance-based) [chain of responsibility](https://refactoring.guru/design-patterns/chain-of-responsibility) (CoR) design pattern. Put more simply, it's like [spaghetti lasagna](https://www.delish.com/cooking/recipe-ideas/recipes/a51533/spaghetti-lasagna-recipe/) - lots of layers encasing lots of noodley code. 

Such a design pattern poses two problems:

1. The compile-time nature precludes the ability to configure behavior at run-time
2. The inheritance-based CoR pattern implicitly defines a fixed set of steps for the code, but misses out on the benefits of a unified orchestrator class or workflow definition - for example, the ability for a step to react to the result of a previous step

I think we can make this better (though that recipe is beyond saving IMO). 

## What should we do about it?

We should rearchitect write-path operations as a __workflow__ comprised of the following configurable steps:

* **Reroute** (route the incoming request to the correct shard/node)
* **Ingest** (process the request)
* **Persist** (make the results of the request durable)
  * This would include separate configuration/extension points for storage and translog
* **Forward** (send the request to another node)

_Persist_ and _Forward_ will be conditional steps that rely on the output of prior steps to determine if they should execute.

## Why do this now?

Because _extensibility_ is one of key themes for Opensearch (https://github.com/opensearch-project/OpenSearch/issues/2095). It is essential that we start tackling this architectural limitation now since we have multiple ongoing initiatives for OpenSearch extensibility that require more run-time configurability:

1. With the introduction of replication strategies like [segment replication](https://github.com/opensearch-project/OpenSearch/issues/2229) being defined _per-index_, write code paths can no longer simply _mandate_ replication. Segment replication no longer needs “replication” to be coupled with “forwarding”. 
2. With a [remote translog](https://github.com/opensearch-project/OpenSearch/issues/1319), the need for “forwarding” is removed entirely.
3. The introduction of [remote storage](https://github.com/opensearch-project/OpenSearch/issues/2700) will affect the behavior of both replication and recovery.

## Open Questions (aka things I'm mulling over)

* What situations/architectures (if any) would require the Reroute step to be optional/configurable?
* Is there a way to remove the need for an _Engine_ class, so that _ingest_ and _translog_ can be configured independent of one another?
* How does this workflow and the decoupling of replication vs forwarding affect _sync_ actions?

---

Given the sheer breadth of functionality in the Opensearch codebase, there are probably other coupled components that I haven't considered. Please comment below if there are things that would break with this workflow approach, or other areas that may benefit from a similar approach.








Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Modeling writes as an extensible workflow #3237

What's the problem?

What should we do about it?

Why do this now?

Open Questions (aka things I'm mulling over)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RFC] Modeling writes as an extensible workflow #3237

Description

What's the problem?

What should we do about it?

Why do this now?

Open Questions (aka things I'm mulling over)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions