Skip to content

Add Support for Qwen2-VL Multi-modal Embedding Models#3694

Merged
zhaochenyang20 merged 5 commits intosgl-project:mainfrom
Titan-p:multimodal-embedding
Mar 7, 2025
Merged

Add Support for Qwen2-VL Multi-modal Embedding Models#3694
zhaochenyang20 merged 5 commits intosgl-project:mainfrom
Titan-p:multimodal-embedding

Conversation

@Titan-p
Copy link
Contributor

@Titan-p Titan-p commented Feb 19, 2025

Add Support for Qwen2-VL Multi-modal Embedding Models

Motivation

This PR introduces multi-modal embedding capabilities to support the Alibaba-NLP/gme-Qwen2-VL-2B-Instruct model, enabling unified processing of both text and image inputs.

Modifications

  1. Model Integration

    • Added model launch configuration for gme-Qwen2-VL
    • Implemented new conversation template
  2. API modification

    • v1/embeddings support text/image
    • Usage Example
      • payload = json.dumps({ "input": [ { "text": "text string" }, { "image": "/home/panlyu/images/006.jpg" } ] })

TODO

  1. fused input support

Checklist

  • Formatted code using pre-commit hooks
  • Updated API documentation with image embedding examples
  • Verified throughput with mixed text/image batches
  • Benchmark results will be shared in Slack channel

@Titan-p Titan-p force-pushed the multimodal-embedding branch 2 times, most recently from 53d20e5 to d169463 Compare February 26, 2025 10:20
@zhaochenyang20
Copy link
Collaborator

#3772

@Titan-p What's the difference?

@zhaochenyang20
Copy link
Collaborator

@Titan-p Please add test to it. And might @mickqian and @yizhang2077 take a look? Thanks a lot!

@Titan-p Titan-p force-pushed the multimodal-embedding branch 2 times, most recently from 0302714 to 2d21d6c Compare March 4, 2025 03:00
@Titan-p
Copy link
Contributor Author

Titan-p commented Mar 4, 2025

@Titan-p Please add test to it. And might @mickqian and @yizhang2077 take a look? Thanks a lot!

Unit test added.

@Titan-p Titan-p force-pushed the multimodal-embedding branch from 8b6692e to 40f3f60 Compare March 4, 2025 05:35
Comment on lines 114 to 122
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be okay right now. But later we need to refactor this. cc @yizhang2077 Do you agree with this? @mickqian

@zhaochenyang20
Copy link
Collaborator

@simveit Could you continue help on this? Thanks so much. If you feel okay. I can merge this

@simveit
Copy link
Contributor

simveit commented Mar 5, 2025

@simveit Could you continue help on this? Thanks so much. If you feel okay. I can merge this

I will go over the code one more time and test it tonight also @zhaochenyang20

@simveit
Copy link
Contributor

simveit commented Mar 5, 2025

The test failed

texts similarity diff tensor(1.9073e-06)
images similarity diff tensor(0.0001)
F
======================================================================
FAIL: test_accuracy (__main__.TestQmeQwenModels)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/misc/simon/sglang/test/srt/models/test_gme_qwen_models.py", line 81, in test_accuracy
    self.assert_close_embeddings(model, prefill_tolerance, torch_dtype)
  File "/home/misc/simon/sglang/test/srt/models/test_gme_qwen_models.py", line 74, in assert_close_embeddings
    assert torch.all(
AssertionError: embeddings are not all close

Before merging it we should increase prefill_tolerance. @zhaochenyang20 @Titan-p

@Titan-p
Copy link
Contributor Author

Titan-p commented Mar 6, 2025

The test failed

texts similarity diff tensor(1.9073e-06)
images similarity diff tensor(0.0001)
F
======================================================================
FAIL: test_accuracy (__main__.TestQmeQwenModels)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/misc/simon/sglang/test/srt/models/test_gme_qwen_models.py", line 81, in test_accuracy
    self.assert_close_embeddings(model, prefill_tolerance, torch_dtype)
  File "/home/misc/simon/sglang/test/srt/models/test_gme_qwen_models.py", line 74, in assert_close_embeddings
    assert torch.all(
AssertionError: embeddings are not all close

Before merging it we should increase prefill_tolerance. @zhaochenyang20 @Titan-p

I think it might be related to differences in the image processing. Could you please provide the test images?

@simveit
Copy link
Contributor

simveit commented Mar 6, 2025

The test failed

texts similarity diff tensor(1.9073e-06)
images similarity diff tensor(0.0001)
F
======================================================================
FAIL: test_accuracy (__main__.TestQmeQwenModels)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/misc/simon/sglang/test/srt/models/test_gme_qwen_models.py", line 81, in test_accuracy
    self.assert_close_embeddings(model, prefill_tolerance, torch_dtype)
  File "/home/misc/simon/sglang/test/srt/models/test_gme_qwen_models.py", line 74, in assert_close_embeddings
    assert torch.all(
AssertionError: embeddings are not all close

Before merging it we should increase prefill_tolerance. @zhaochenyang20 @Titan-p

I think it might be related to differences in the image processing. Could you please provide the test images?

i ran it with default on a100

@zhaochenyang20
Copy link
Collaborator

also, fix the confilictws

@Titan-p Titan-p force-pushed the multimodal-embedding branch from 66acbdb to 90fc9ea Compare March 6, 2025 07:15
@Titan-p Titan-p force-pushed the multimodal-embedding branch from 90fc9ea to ca86152 Compare March 6, 2025 07:19
@Titan-p
Copy link
Contributor Author

Titan-p commented Mar 6, 2025

cc @zhaochenyang20 @simveit. I think this PR is ready to be merged.

@zhaochenyang20
Copy link
Collaborator

I tested it locally and LGTM.

@zhaochenyang20 zhaochenyang20 merged commit 361971b into sgl-project:main Mar 7, 2025
20 checks passed
@simveit
Copy link
Contributor

simveit commented Mar 7, 2025

@zhaochenyang20 i will integrate an corresponding example for this over the weekend.

@zhaochenyang20
Copy link
Collaborator

@simveit Thansk1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants