Skip to content

[Blog Post] asymmetric model support in neural search#4058

Open
fen-qin wants to merge 1 commit intoopensearch-project:mainfrom
fen-qin:asymmetric_model_support
Open

[Blog Post] asymmetric model support in neural search#4058
fen-qin wants to merge 1 commit intoopensearch-project:mainfrom
fen-qin:asymmetric_model_support

Conversation

@fen-qin
Copy link

@fen-qin fen-qin commented Jan 12, 2026

Description

This PR is for asymmetric model support in neural search blog post.

Issues Resolved

Check List

  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the BSD-3-Clause License.

@github-actions
Copy link

Thank you for submitting a blog post!

The blog post review process is: Submit a PR -> (Optional) Peer review -> Doc review -> Editorial review -> Marketing review -> Published.

@github-actions
Copy link

Hi @fen-qin,

It looks like you're adding a new blog post but don't have an issue mentioned. Please link this PR to an open issue using one of these keywords in the PR description:

  • Closes #issue-number
  • Fixes #issue-number
  • Resolves #issue-number

If an issue hasn't been created yet, please create one and then link it to this PR.



```json
PUT /_ingest/pipeline/asymmetric_embedding_pipeline
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use either semantic text field or text embedding processor instead of ml_inference processor?

@fen-qin fen-qin force-pushed the asymmetric_model_support branch from 050d85b to f640500 Compare March 3, 2026 01:12
Signed-off-by: Fen Qin <mfenqin@amazon.com>
@fen-qin fen-qin force-pushed the asymmetric_model_support branch from f640500 to 5da7c0b Compare March 3, 2026 01:17
```
cd opensearch-py-ml/docs/source/example/common

## deploy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to pip install requirement first?

Follow these steps to implement asymmetric neural search in your OpenSearch cluster. This example uses a remote SageMaker endpoint, but you can also deploy models locally.

1. Prerequisites: Deploy a sagemaker endpoint
check out the deployment scripts: https://github.com/opensearch-project/opensearch-py-ml/pull/587
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What blocks this PR from being merged? It's a little bit weird that we point people to a PR. Should we get it merged and point people to the README?

"region": "<YOUR_AWS_REGION>",
"service_name": "sagemaker"
},
"credential": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we call out in the AOS we need to follow this doc https://docs.aws.amazon.com/opensearch-service/latest/developerguide/ml-amazon-connector.html to create the connector?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The blog should be about using opensource OpenSearch not AWS OpenSearch.

"description": "Asymmetric E5 embedding model for semantic search",
"connector_id": "<YOUR_CONNECTOR_ID>",
"model_config": {
"model_type": "text_embedding",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tested this can work? I recall we need to use TEXT_EMBEDDING as the model_type for the semantic field to recognize it. - https://github.com/opensearch-project/neural-search/blob/de482d7fcfbfbebddc341c7e6bc50c7504a808cc/src/main/java/org/opensearch/neuralsearch/util/SemanticMLModelUtils.java#L60

},
"passage_text": {
"type": "semantic",
"model_id": "<YOUR_MODEL_ID>",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add actual model id you used during testing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And tell how to get the model id as well?

OpenSearch returns response:

```json
{"took":317,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":{"value":1,"relation":"eq"},"max_score":0.25255635,"hits":[{"_index":"my-nlp-index","_id":"1","_score":0.25255635,"_source":{"passage_text":"Hello world","id":"s1"}}]}}%
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

% is added at the end. Is this intentional?


## Next steps

- Review the [asymmetric model documentation](https://opensearch.org/docs/latest/tutorials/vector-search/semantic-search/semantic-search-asymmetric/) for detailed configuration options
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might need to update the document before publishing this blog.


This distinction allows the model to learn specialized representations. For example, the E5 model internally processes "What are some parks in NYC?" as `query: What are some parks in NYC?` during search, while indexing "Central Park is a large public park..." as `passage: Central Park is a large public park...`. This asymmetry helps the model better match short queries to longer documents.

## Why asymmetric models outperform symmetric models
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## Why asymmetric models outperform symmetric models
## When asymmetric models outperform symmetric models


Neural search in OpenSearch has traditionally used symmetric embedding models, where queries and documents are encoded identically. While effective, this approach doesn't reflect how search actually works: queries are typically short and question-like, while documents are longer and information-rich. Asymmetric embedding models address this mismatch by optimizing embeddings differently for queries versus documents, leading to significant improvements in search relevance.

OpenSearch now supports asymmetric embedding models, including state-of-the-art models like E5 that dominate the [MTEB leaderboard](https://huggingface.co/spaces/mteb/leaderboard). In this post, you'll learn how asymmetric models work, see comprehensive benchmark results, and follow a step-by-step guide to implement asymmetric neural search in your OpenSearch cluster.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
OpenSearch now supports asymmetric embedding models, including state-of-the-art models like E5 that dominate the [MTEB leaderboard](https://huggingface.co/spaces/mteb/leaderboard). In this post, you'll learn how asymmetric models work, see comprehensive benchmark results, and follow a step-by-step guide to implement asymmetric neural search in your OpenSearch cluster.
Semantic text field now supports asymmetric embedding models, including state-of-the-art models like E5 that dominate the [MTEB leaderboard](https://huggingface.co/spaces/mteb/leaderboard). In this post, you'll learn how asymmetric models work, see comprehensive benchmark results, and follow a step-by-step guide to implement asymmetric neural search in your OpenSearch cluster.

@pajuric
Copy link

pajuric commented Mar 5, 2026

@fen-qin - Please reach out to me when you are ready to move this forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BLOG] Asymmetric Models Support in Neural Search

4 participants