Skip to content

[Feature Request] Add configurability to run ingest pipelines during document update operations #17742

@q-andy

Description

@q-andy

Is your feature request related to a problem? Please describe

Currently, behavior for ingest pipelines with single/bulk update/upsert operations is inconsistent. Some combinations of parameters will trigger ingest pipelines, but some will not. Many users want functionality to bulk update documents and have their updates pass through ingest pipelines and are confused when this is not the case. This is currently supported by single update API but NOT in the bulk update API, and #17679 will make single update consistent with bulk update, meaning neither will trigger ingest pipelines. single/bulk consistency has been put on hold until 4.0.0.

However, even if this is by design, many users have asked for the bulk update ingest pipeline use case. For example, if an ingest pipeline has a text embedding processor set up to generate embeddings on passage text,zwe want the embeddings to be regenerated via pipeline processor when the passage text field is updated, but bulk update operation does not support that. Currently the only workaround is to use the docAsUpsert flag with a partial doc, and it is unintuitive that enabling such a flag would have an effect on how ingest pipelines are applied.

We want a straightforward and clear way for users to specify to invoke ingest pipeline for their update operations in both single update and bulk cases.

Issues opened due to this inconsistency/confusion:

Describe the solution you'd like

We want a method of specifying that we want documents to pass through ingest pipelines update and bulk update scenarios. Some ideas:

  • Having a pipeline-level configuration option (preferred)
  • Having an additional parameter in the bulk API interface to force ingest pipeline invocation (similar to docAsUpsert workaround but more explicit)
  • Having a new type of pipeline process that converts update requests to index requests (may not be possible)

The functionality to trigger pipelines from update requests already exists in the current single update operation that is being removed in #17679 and this logic can be reused.

Related component

Indexing

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

IndexingIndexing, Bulk Indexing and anything related to indexingenhancementEnhancement or improvement to existing feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions