-
Notifications
You must be signed in to change notification settings - Fork 4.7k
Closed
Description
Hi,
I'm trying to load InternVL3 using sglang but encountered issues. I noticed this pull request: #5350 mentions that "InternVL3 includes a flashattention implementation for vision models. However, it doesn't support Tensor Parallelism (TP), which could be a bottleneck".
Does this imply I need to disable TP for a bug-free experience with InternVL3? If yes, are there any known workarounds or fixes planned for future releases?
Thank you!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels