Skip to content

support midashenglm#5241

Closed
JimmyMa99 wants to merge 14 commits intomodelscope:mainfrom
JimmyMa99:main
Closed

support midashenglm#5241
JimmyMa99 wants to merge 14 commits intomodelscope:mainfrom
JimmyMa99:main

Conversation

@JimmyMa99
Copy link
Contributor

PR type

  • Bug Fix
  • New Feature
  • Document Updates
  • More Models or Datasets Support

PR information

support midashenglm

Experiment results

Paste your experiment result here(if needed).

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @JimmyMa99, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces support for the midashenglm model, expanding the range of models available within the system. This is part of an ongoing effort to integrate more models and datasets, specifically focusing on adding new multi-modal capabilities.

Highlights

  • New Model Type Definition: I've added midashenglm as a new MLLMModelType constant, making it discoverable within the system's model registry.
  • Model and Tokenizer Integration: I've implemented a dedicated function, get_model_tokenizer_midashenglm, to handle the loading of the midashenglm model and its associated processor. This function also includes logic for compatibility with quantized models (like AWQ) and ensures proper patching of input embeddings for audio processing.
  • Model Architecture Registration: I've registered the midashenglm model's architecture, specifying its language model (decoder), aligner (audio_projector), and generator components, which is crucial for its proper functioning within the framework.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for the midashenglm model. The changes are generally well-structured and follow the existing patterns in the codebase. I've identified a few areas for improvement in swift/llm/model/model/qwen.py: translating Chinese comments to English for better maintainability, correcting a potential model ID typo, reviewing dependencies, and refining a list update logic to prevent duplicates. Addressing these points will enhance the code's quality and correctness.

Comment on lines +783 to +814
def get_model_tokenizer_midashenglm(model_dir, *args, **kwargs):
from transformers import AutoModelForCausalLM, AutoProcessor, AutoConfig

# 设置默认的 automodel_class,如果外部未提供
kwargs['automodel_class'] = kwargs.get('automodel_class', AutoModelForCausalLM)


processor = AutoProcessor.from_pretrained(model_dir, trust_remote_code=True)
kwargs['tokenizer'] = processor.tokenizer

# 加载模型配置
model_config = AutoConfig.from_pretrained(model_dir, trust_remote_code=True)
kwargs['model_config'] = model_config


model, _ = get_model_tokenizer_with_flash_attn(model_dir, *args, **kwargs)

if model:
# 兼容 AWQ 等量化模型,获取基础模型以进行修改
base_model = model.model if 'AWQ' in model.__class__.__name__ else model

# 假设 use_submodel_func 是一个必要的初始化步骤
use_submodel_func(base_model, 'decoder')

if not hasattr(base_model.config, 'keys_to_ignore_at_inference'):
base_model.config.keys_to_ignore_at_inference = []
base_model.config.keys_to_ignore_at_inference += ['hidden_states', 'attention_mask']

if hasattr(base_model, 'audio_encoder'):
patch_get_input_embeddings(base_model.audio_encoder, 'patch_embed')

return model, processor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The comments in this function are in Chinese. To improve code maintainability and ensure all contributors can understand the code, please translate them to English. For example:

  • # 设置默认的 automodel_class,如果外部未提供 -> # Set the default automodel_class if not provided externally.
  • # 加载模型配置 -> # Load model configuration.
  • # 兼容 AWQ 等量化模型,获取基础模型以进行修改 -> # Compatible with AWQ and other quantized models, get the base model for modification.
  • # 假设 use_submodel_func 是一个必要的初始化步骤 -> # Assume use_submodel_func is a necessary initialization step.

Comment on lines +807 to +809
if not hasattr(base_model.config, 'keys_to_ignore_at_inference'):
base_model.config.keys_to_ignore_at_inference = []
base_model.config.keys_to_ignore_at_inference += ['hidden_states', 'attention_mask']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This logic for updating keys_to_ignore_at_inference can be made more robust to avoid adding duplicate keys to the list.

Suggested change
if not hasattr(base_model.config, 'keys_to_ignore_at_inference'):
base_model.config.keys_to_ignore_at_inference = []
base_model.config.keys_to_ignore_at_inference += ['hidden_states', 'attention_mask']
keys_to_ignore = getattr(base_model.config, 'keys_to_ignore_at_inference', [])
keys_to_ignore.extend(k for k in ['hidden_states', 'attention_mask'] if k not in set(keys_to_ignore))
base_model.config.keys_to_ignore_at_inference = keys_to_ignore

get_model_tokenizer_midashenglm,
model_arch=ModelArch.midashenglm,
architectures=['MiDashengLMModel'],
requires=['transformers>=4.50', 'soundfile', 'qwen_omni_utils', 'decord'],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The requires list includes 'qwen_omni_utils'. This dependency seems to have been copied from the qwen2_5_omni model. Does midashenglm actually depend on qwen_omni_utils? If not, it should be removed to avoid installing unnecessary packages.

JimmyMa99 and others added 2 commits August 4, 2025 17:47
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@JimmyMa99 JimmyMa99 closed this Aug 9, 2025
@Jintao-Huang
Copy link
Collaborator

#5325

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants