Skip to content

[Blog Post] asymmetric model support in neural search#4058

Open
fen-qin wants to merge 1 commit into
opensearch-project:mainfrom
fen-qin:asymmetric_model_support
Open

[Blog Post] asymmetric model support in neural search#4058
fen-qin wants to merge 1 commit into
opensearch-project:mainfrom
fen-qin:asymmetric_model_support

Conversation

@fen-qin
Copy link
Copy Markdown

@fen-qin fen-qin commented Jan 12, 2026

Description

This PR is for asymmetric model support in neural search blog post.

Issues Resolved

Check List

  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the BSD-3-Clause License.

@github-actions
Copy link
Copy Markdown

Thank you for submitting a blog post!

The blog post review process is: Submit a PR -> (Optional) Peer review -> Doc review -> Editorial review -> Marketing review -> Published.

@github-actions
Copy link
Copy Markdown

Hi @fen-qin,

It looks like you're adding a new blog post but don't have an issue mentioned. Please link this PR to an open issue using one of these keywords in the PR description:

  • Closes #issue-number
  • Fixes #issue-number
  • Resolves #issue-number

If an issue hasn't been created yet, please create one and then link it to this PR.



```json
PUT /_ingest/pipeline/asymmetric_embedding_pipeline
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use either semantic text field or text embedding processor instead of ml_inference processor?

@fen-qin fen-qin force-pushed the asymmetric_model_support branch from 050d85b to f640500 Compare March 3, 2026 01:12
Signed-off-by: Fen Qin <mfenqin@amazon.com>
@fen-qin fen-qin force-pushed the asymmetric_model_support branch from f640500 to 5da7c0b Compare March 3, 2026 01:17
```
cd opensearch-py-ml/docs/source/example/common

## deploy
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to pip install requirement first?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. that's true. this is the setup process for host asymmetric model in sagemaker endpoint. I'm thinking if we should only focus on local model for opensearch. so, it had less dependencies with the aws resources

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In blog, we don't need to focus on local model. We can use sage maker.

Follow these steps to implement asymmetric neural search in your OpenSearch cluster. This example uses a remote SageMaker endpoint, but you can also deploy models locally.

1. Prerequisites: Deploy a sagemaker endpoint
check out the deployment scripts: https://github.com/opensearch-project/opensearch-py-ml/pull/587
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What blocks this PR from being merged? It's a little bit weird that we point people to a PR. Should we get it merged and point people to the README?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, these PRs will be merged before blog. blog review usually takes longer time. let me connect with ml-commons team to get it merge before the blog

"region": "<YOUR_AWS_REGION>",
"service_name": "sagemaker"
},
"credential": {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we call out in the AOS we need to follow this doc https://docs.aws.amazon.com/opensearch-service/latest/developerguide/ml-amazon-connector.html to create the connector?

Copy link
Copy Markdown
Contributor

@heemin32 heemin32 Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The blog should be about using opensource OpenSearch not AWS OpenSearch.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here is the current how local asymmetric model is setup - https://docs.opensearch.org/latest/tutorials/vector-search/semantic-search/semantic-search-asymmetric/

it requires hosting model to local running endpoint.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think most of users won't setup local model. Maybe, you can share how you set it up in sage maker so that users can reference it.

Comment thread _posts/2026-01-15-asymmetric-model-support-neural-search.md
},
"passage_text": {
"type": "semantic",
"model_id": "<YOUR_MODEL_ID>",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add actual model id you used during testing?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And tell how to get the model id as well?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean getModel api to get the model id ?, the model_id will change everytime we create a new model

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not matter if model_id change or not; they will not use the same model id anyway. It is more about showing readers how to get the model id.

OpenSearch returns response:

```json
{"took":317,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":{"value":1,"relation":"eq"},"max_score":0.25255635,"hits":[{"_index":"my-nlp-index","_id":"1","_score":0.25255635,"_source":{"passage_text":"Hello world","id":"s1"}}]}}%
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

% is added at the end. Is this intentional?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it. will remove the %


## Next steps

- Review the [asymmetric model documentation](https://opensearch.org/docs/latest/tutorials/vector-search/semantic-search/semantic-search-asymmetric/) for detailed configuration options
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might need to update the document before publishing this blog.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, let's just remove this line. There is no detailed configuration options in the linked page.


This distinction allows the model to learn specialized representations. For example, the E5 model internally processes "What are some parks in NYC?" as `query: What are some parks in NYC?` during search, while indexing "Central Park is a large public park..." as `passage: Central Park is a large public park...`. This asymmetry helps the model better match short queries to longer documents.

## Why asymmetric models outperform symmetric models
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## Why asymmetric models outperform symmetric models
## When asymmetric models outperform symmetric models


Neural search in OpenSearch has traditionally used symmetric embedding models, where queries and documents are encoded identically. While effective, this approach doesn't reflect how search actually works: queries are typically short and question-like, while documents are longer and information-rich. Asymmetric embedding models address this mismatch by optimizing embeddings differently for queries versus documents, leading to significant improvements in search relevance.

OpenSearch now supports asymmetric embedding models, including state-of-the-art models like E5 that dominate the [MTEB leaderboard](https://huggingface.co/spaces/mteb/leaderboard). In this post, you'll learn how asymmetric models work, see comprehensive benchmark results, and follow a step-by-step guide to implement asymmetric neural search in your OpenSearch cluster.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
OpenSearch now supports asymmetric embedding models, including state-of-the-art models like E5 that dominate the [MTEB leaderboard](https://huggingface.co/spaces/mteb/leaderboard). In this post, you'll learn how asymmetric models work, see comprehensive benchmark results, and follow a step-by-step guide to implement asymmetric neural search in your OpenSearch cluster.
Semantic text field now supports asymmetric embedding models, including state-of-the-art models like E5 that dominate the [MTEB leaderboard](https://huggingface.co/spaces/mteb/leaderboard). In this post, you'll learn how asymmetric models work, see comprehensive benchmark results, and follow a step-by-step guide to implement asymmetric neural search in your OpenSearch cluster.

@pajuric
Copy link
Copy Markdown

pajuric commented Mar 5, 2026

@fen-qin - Please reach out to me when you are ready to move this forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BLOG] Asymmetric Models Support in Neural Search

4 participants