[model] support Qwen3.5 all series models#10237
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the model compatibility by integrating a broader range of the Qwen3.5 model family. It introduces various sizes and types, including dense and sparse Mixture of Experts (MoE) models, along with their respective Base and Thinking variants. This expansion allows users to leverage Qwen3.5's advanced multimodal and multilingual capabilities, offering a broader range of options for fine-tuning and deployment across diverse applications. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request adds support for the Qwen3.5 model series. I've found a minor naming inconsistency in the newly added model definitions and suggested a change to improve maintainability by aligning the naming with the established convention in the repository.
What does this PR do?
Add support for the full Qwen3.5 model series (Base and Thinking variants) in
src/llamafactory/extras/constants.py.Fixes # (issue)
Summary
This PR registers the complete Qwen3.5 model family, which was released by the Qwen Team on 2026-02-15. Qwen3.5 is a native multimodal model series built on a hybrid architecture (Gated DeltaNet linear attention + Gated Attention + Sparse MoE), supporting 262K context natively and 201 languages.
Models Added
🚀 Qwen3.5 Series: The Complete Model Matrix
1. Dense Models
The Dense series provides consistent performance across general-purpose tasks and is the foundation for standard fine-tuning.
Qwen/Qwen3.5-0.8BQwen/Qwen3.5-2BQwen/Qwen3.5-4BQwen/Qwen3.5-9BQwen/Qwen3.5-27B2. Sparse MoE Models (Mixture of Experts)
These models utilize a "High Capacity, Low Activation" architecture, offering flagship-level intelligence with significantly faster inference speeds.
Qwen/Qwen3.5-35B-A3BQwen/Qwen3.5-122B-A10BQwen/Qwen3.5-397B-A17B💡 Key Takeaways for Developers
Naming Convention
-Base: Raw pre-trained model weights. No instruction following / thinking behavior. Suitable as fine-tuning starting points.-Thinking: Post-trained models (SFT + large-scale RL). Operate in thinking mode by default, generating<think>...</think>reasoning chains before final responses. Note: unlike Qwen3, Qwen3.5 does not support the/thinkand/nothinksoft switch; thinking is controlled viachat_template_kwargs: {"enable_thinking": false}.Template
All models use
template="qwen3_5"withmultimodal=True, as all Qwen3.5 models are natively multimodal (early-fusion vision-language).Key Architecture Notes
3 × Gated DeltaNet (linear attention) + 1 × Gated Attention (full attention)per block unit, enabling O(n) complexity for most layers.Files Changed
src/llamafactory/extras/constants.py: Addedregister_model_groupfor the full Qwen3.5 series.Before submitting