Skip to content

[BUG] update document with _bulk fails to generate embeddings in inference processors #17494

@will-hwang

Description

@will-hwang

Describe the bug

Bug Description:

embeddings are not generated when documents are updated with /index/_bulk operation, when successfully generated with insert /index/_doc/ operation. Is there any change made in Opensearch that skips embedding generation for bulk operation only?

How to reproduce the error:

Related component

No response

To Reproduce

  1. deploy model

POST /_plugins/_ml/models/{model_id}/_deploy

  1. configure pipeline
"processors": [
    {
      "text_embedding": {
        "model_id": {model_id},
        "field_map": {
          "text": "passage_embedding"
        }
      }
    }
  1. ingest doc
PUT /my-nlp-index/_doc/1
{
  "text": "hello world"
}
  1. update doc with _bulk
PUT /my-nlp-index/_bulk
{ "update": { "_index": "my-nlp-index", "_id": "1" } }
{ "doc" : { "text": "bye world" } }

Expected behavior

Embeddings are created for the initial ingest for "text":"hello world", but not updated with bulk operation
{ "doc" : { "text": "bye world" } }

Embeddings should be re-generated for bulk operation by calling text_embedding_processor

Additional Details

Plugins
opensearch-ml, opensearch-knn, opensearch-neural-search

Screenshots
If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information):

  • OS: [e.g. iOS] MAC OS
  • Version [e.g. 22] Sequoia 15.3

Additional context
Add any other context about the problem here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    IndexingIndexing, Bulk Indexing and anything related to indexingbugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions