Add Support for Qwen2-VL Multi-modal Embedding Models#3694
Add Support for Qwen2-VL Multi-modal Embedding Models#3694zhaochenyang20 merged 5 commits intosgl-project:mainfrom
Conversation
53d20e5 to
d169463
Compare
|
@Titan-p Please add test to it. And might @mickqian and @yizhang2077 take a look? Thanks a lot! |
0302714 to
2d21d6c
Compare
Unit test added. |
8b6692e to
40f3f60
Compare
python/sglang/srt/conversation.py
Outdated
There was a problem hiding this comment.
It should be okay right now. But later we need to refactor this. cc @yizhang2077 Do you agree with this? @mickqian
|
@simveit Could you continue help on this? Thanks so much. If you feel okay. I can merge this |
I will go over the code one more time and test it tonight also @zhaochenyang20 |
|
The test failed Before merging it we should increase |
I think it might be related to differences in the image processing. Could you please provide the test images? |
i ran it with default on a100 |
|
also, fix the confilictws |
66acbdb to
90fc9ea
Compare
90fc9ea to
ca86152
Compare
|
cc @zhaochenyang20 @simveit. I think this PR is ready to be merged. |
|
I tested it locally and LGTM. |
|
@zhaochenyang20 i will integrate an corresponding example for this over the weekend. |
|
@simveit Thansk1 |
Add Support for Qwen2-VL Multi-modal Embedding Models
Motivation
This PR introduces multi-modal embedding capabilities to support the Alibaba-NLP/gme-Qwen2-VL-2B-Instruct model, enabling unified processing of both text and image inputs.
Modifications
Model Integration
API modification
payload = json.dumps({ "input": [ { "text": "text string" }, { "image": "/home/panlyu/images/006.jpg" } ] })TODO
Checklist