-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Description
Is your feature request related to a problem? Please describe
Currently, behavior for ingest pipelines with single/bulk update/upsert operations is inconsistent. Some combinations of parameters will trigger ingest pipelines, but some will not. Many users want functionality to bulk update documents and have their updates pass through ingest pipelines and are confused when this is not the case. This is currently supported by single update API but NOT in the bulk update API, and #17679 will make single update consistent with bulk update, meaning neither will trigger ingest pipelines. single/bulk consistency has been put on hold until 4.0.0.
However, even if this is by design, many users have asked for the bulk update ingest pipeline use case. For example, if an ingest pipeline has a text embedding processor set up to generate embeddings on passage text,zwe want the embeddings to be regenerated via pipeline processor when the passage text field is updated, but bulk update operation does not support that. Currently the only workaround is to use the docAsUpsert flag with a partial doc, and it is unintuitive that enabling such a flag would have an effect on how ingest pipelines are applied.
We want a straightforward and clear way for users to specify to invoke ingest pipeline for their update operations in both single update and bulk cases.
Issues opened due to this inconsistency/confusion:
- [BUG] update document with _bulk fails to generate embeddings in inference processors #17494
- [FEATURE] Add support with the Update API neural-search#213
- [BUG] Ingest pipeline bulk update issue #16663
- [BUG] Bulk upsert does not behave like a single Upsert, with an ingestion pipeline #10864
- Using script upsert with _bulk API won't triggered #2607
- [Cleanup] TransportUpdateAction should extend TransportSingleItemBulkWriteAction #16980
Describe the solution you'd like
We want a method of specifying that we want documents to pass through ingest pipelines update and bulk update scenarios. Some ideas:
- Having a pipeline-level configuration option (preferred)
- Having an additional parameter in the bulk API interface to force ingest pipeline invocation (similar to docAsUpsert workaround but more explicit)
- Having a new type of pipeline process that converts update requests to index requests (may not be possible)
The functionality to trigger pipelines from update requests already exists in the current single update operation that is being removed in #17679 and this logic can be reused.
Related component
Indexing
Describe alternatives you've considered
No response
Additional context
No response