[model] support Midashenglm by JimmyMa99 · Pull Request #5325 · modelscope/ms-swift

JimmyMa99 · 2025-08-09T14:58:45Z

PR type

Bug Fix
New Feature
Document Updates
More Models or Datasets Support

PR information

support midashenglm

Experiment results

Paste your experiment result here(if needed).

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

gemini-code-assist

Summary of Changes

Hello @JimmyMa99, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

I've implemented support for the midashenglm-7b model, a new multimodal model that handles audio. This involved updating documentation, defining new model constants and architectures, and creating a specific function to load and configure the model and its associated processor and tokenizer. Additionally, I've developed a dedicated template to manage audio inputs and chat formatting for this model, and included a new test case to ensure its proper functionality.

Highlights

New Model Integration: I've added comprehensive support for the midashenglm-7b model, allowing it to be integrated and utilized within the system.
Documentation Updates: I've updated both the Chinese and English documentation files to include midashenglm-7b in the list of officially supported models, ensuring users can easily find information about its availability.
System Constants and Architecture Definition: I've extended the system's constant definitions and model architecture registrations to properly recognize and categorize midashenglm as a new model type and architecture, detailing its specific components like the audio encoder and projector.
Model and Tokenizer Configuration: I've implemented a dedicated function, get_model_tokenizer_midashenglm, to handle the specific loading and configuration requirements of the midashenglm model, including its tokenizer and processor, and ensuring proper handling of its audio-related parameters.
New Template Implementation: I've introduced a new template, MiDashengLMTemplate, specifically designed for midashenglm. This template manages how audio inputs are processed and how chat messages are formatted for this multimodal model.
Test Case Addition: I've added a new test case, test_midashenglm, to validate the correct functionality and integration of the midashenglm model, particularly its ability to process audio inputs and generate appropriate responses.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request adds support for the Midashenglm model. The changes include adding the model configuration, a new template, and updating documentation. I've found a few issues, including a potential bug in the label generation logic, inconsistencies in the documentation and tests, and some areas for code improvement for better maintainability and robustness. Please see the detailed comments below.

gemini-code-assist · 2025-08-09T15:00:14Z

swift/llm/template/template/midashenglm.py

+                assistant_start = len(new_input_ids) // 2
+                labels[assistant_start:] = new_input_ids[assistant_start:]


The logic for determining the start of the assistant's response (assistant_start = len(new_input_ids) // 2) is brittle. It assumes the user's prompt and the assistant's response have equal token lengths, which is rarely the case. This can lead to incorrect label generation during training and potential model performance issues. A more robust approach, such as finding special tokens that mark the beginning of the assistant's turn, should be used.

gemini-code-assist · 2025-08-09T15:00:14Z

tests/test_align/test_template/test_audio.py

    # test_qwen2_5_omni()
-    test_gemma3n()
+    # test_gemma3n()
+    test_midashenglm


The test function test_midashenglm is called without parentheses. This is a syntax error and will prevent the test from executing when running the script directly.

Suggested change

test_midashenglm

test_midashenglm()

gemini-code-assist · 2025-08-09T15:00:15Z

docs/source/Instruction/支持的模型和数据集.md

The requires field for midashenglm includes qwen_omni_utils, which appears to be incorrect based on the model's registration in swift/llm/model/model/qwen.py. To ensure consistency and prevent confusion, please remove this dependency from the documentation. The correct dependencies are transformers>=4.50, soundfile, and decord.

Suggested change

|[mispeech/midashenglm-7b](https://modelscope.cn/models/mispeech/midashenglm-7b)|midashenglm|midashenglm|transformers>=4.50, soundfile, qwen_omni_utils, decord|✘|audio|[mispeech/midashenglm-7b](https://huggingface.co/mispeech/midashenglm-7b)|

|[mispeech/midashenglm-7b](https://modelscope.cn/models/mispeech/midashenglm-7b)|midashenglm|midashenglm|transformers>=4.50, soundfile, decord|✘|audio|[mispeech/midashenglm-7b](https://huggingface.co/mispeech/midashenglm-7b)|

gemini-code-assist · 2025-08-09T15:00:15Z

docs/source_en/Instruction/Supported-models-and-datasets.md

 |[AIDC-AI/Ovis2-34B](https://modelscope.cn/models/AIDC-AI/Ovis2-34B)|ovis2|ovis2|transformers>=4.46.2, moviepy<2|&#x2718;|vision|[AIDC-AI/Ovis2-34B](https://huggingface.co/AIDC-AI/Ovis2-34B)|
 |[XiaomiMiMo/MiMo-VL-7B-SFT](https://modelscope.cn/models/XiaomiMiMo/MiMo-VL-7B-SFT)|mimo_vl|mimo_vl|transformers>=4.49, qwen_vl_utils>=0.0.6, decord|&#x2718;|vision, video|[XiaomiMiMo/MiMo-VL-7B-SFT](https://huggingface.co/XiaomiMiMo/MiMo-VL-7B-SFT)|
 |[XiaomiMiMo/MiMo-VL-7B-RL](https://modelscope.cn/models/XiaomiMiMo/MiMo-VL-7B-RL)|mimo_vl|mimo_vl|transformers>=4.49, qwen_vl_utils>=0.0.6, decord|&#x2718;|vision, video|[XiaomiMiMo/MiMo-VL-7B-RL](https://huggingface.co/XiaomiMiMo/MiMo-VL-7B-RL)|
+|[mispeech/midashenglm-7b](https://modelscope.cn/models/mispeech/midashenglm-7b)|midashenglm|midashenglm|transformers>=4.50, soundfile, qwen_omni_utils, decord|&#x2718;|audio|[mispeech/midashenglm-7b](https://huggingface.co/mispeech/midashenglm-7b)|


The requires field for midashenglm includes qwen_omni_utils, which is inconsistent with the model's registration in the code. Please remove this dependency to align the documentation with the implementation. The correct dependencies are transformers>=4.50, soundfile, and decord.

Suggested change

|[mispeech/midashenglm-7b](https://modelscope.cn/models/mispeech/midashenglm-7b)|midashenglm|midashenglm|transformers>=4.50, soundfile, qwen_omni_utils, decord|✘|audio|[mispeech/midashenglm-7b](https://huggingface.co/mispeech/midashenglm-7b)|

|[mispeech/midashenglm-7b](https://modelscope.cn/models/mispeech/midashenglm-7b)|midashenglm|midashenglm|transformers>=4.50, soundfile, decord|✘|audio|[mispeech/midashenglm-7b](https://huggingface.co/mispeech/midashenglm-7b)|

gemini-code-assist · 2025-08-09T15:00:15Z

swift/llm/model/model/qwen.py

+        model.transformer.registered_causal_mask = model.transformer.registered_causal_mask.cuda()
+        logger.info('registered_causal_mask to cuda')


Hardcoding .cuda() is not best practice as it restricts the model to run only on CUDA devices and can cause errors in environments without a GPU or with other accelerators. It's better to move the tensor to the model's device dynamically.

Suggested change

model.transformer.registered_causal_mask = model.transformer.registered_causal_mask.cuda()

logger.info('registered_causal_mask to cuda')

device = next(model.parameters()).device

model.transformer.registered_causal_mask = model.transformer.registered_causal_mask.to(device)

logger.info(f'registered_causal_mask to {device}')

gemini-code-assist · 2025-08-09T15:00:15Z

swift/llm/template/template/midashenglm.py

+            )
+
+        except Exception as e:
+            print(f'Processor apply_chat_template failed: {e}')


Using print() for logging exceptions is not ideal. It's better to use the logging module (e.g., logger.warning or logger.error) for consistency with the rest of the codebase and to allow for better log management. Please replace print with a logger call after importing the logger.

gemini-code-assist · 2025-08-09T15:00:15Z

swift/llm/template/template/midashenglm.py

+
+
+@dataclass
+class QwenTemplateMeta(ChatmlTemplateMeta):


The class name QwenTemplateMeta is misleading as it's being used for the midashenglm model. For better code clarity and maintainability, please rename it to MiDashengLMTemplateMeta. You'll also need to update its usage in the register_template call on line 158.

Suggested change

class QwenTemplateMeta(ChatmlTemplateMeta):

class MiDashengLMTemplateMeta(ChatmlTemplateMeta):

gemini-code-assist · 2025-08-09T15:00:15Z

tests/test_align/test_template/test_audio.py



+def test_midashenglm():
+    pt_engine = PtEngine('midasheng/midashenglm-7b')


The model ID used in this test, midasheng/midashenglm-7b, is inconsistent with the one registered in the application code, which is mispeech/midashenglm-7b. Please use the same model ID in the test to ensure it's testing the correct model configuration.

Suggested change

pt_engine = PtEngine('midasheng/midashenglm-7b')

pt_engine = PtEngine('mispeech/midashenglm-7b')

Jintao-Huang · 2025-08-09T15:47:45Z

/gemini review

gemini-code-assist

Code Review

This pull request adds support for the Midashenglm audio model. The changes include adding the model to documentation, registering its type, architecture, and template, and adding a new test case. A significant part of this PR is a refactoring in swift/llm/template/base.py which merges _extend_tokens and _extend_loss_scale into a single function, improving code maintainability. My review focuses on performance improvements for this refactored function and for the new data collator in the midashenglm template.

gemini-code-assist · 2025-08-09T15:49:51Z

swift/llm/template/base.py

+    def _extend_tokens(
+            input_ids: List[int], labels: Optional[List[int]], loss_scale: Optional[List[float]],
+            replace_idx_list: List[int],
+            get_new_tokens: Callable[[int], List[int]]) -> Tuple[List[int], Optional[List[int]], Optional[List[float]]]:
        added_tokens_len = 0
        for i, idx in enumerate(replace_idx_list):
            new_tokens = get_new_tokens(i)
            token_len = len(new_tokens)
            input_ids = input_ids[:idx + added_tokens_len] + new_tokens + input_ids[added_tokens_len + idx + 1:]
            if labels:
                labels = labels[:idx + added_tokens_len] + [-100] * token_len + labels[added_tokens_len + idx + 1:]
-            added_tokens_len += token_len - 1
-        return input_ids, labels
-
-    @staticmethod
-    def _extend_loss_scale(loss_scale: Optional[List[float]], replace_idx_list: List[int],
-                           get_new_tokens: Callable[[int], List[int]]) -> Optional[List[float]]:
-        if loss_scale:
-            added_tokens_len = 0
-            for i, idx in enumerate(replace_idx_list):
-                new_tokens = get_new_tokens(i)
-                token_len = len(new_tokens)
+            if loss_scale:
                scale_idx = loss_scale[idx + added_tokens_len]
                loss_scale = loss_scale[:idx + added_tokens_len] + [scale_idx] * token_len + loss_scale[added_tokens_len
                                                                                                        + idx + 1:]
-                added_tokens_len += token_len - 1
-        return loss_scale
+            added_tokens_len += token_len - 1
+        return input_ids, labels, loss_scale


The current implementation of _extend_tokens uses list slicing and concatenation inside a loop (list = list[:idx] + ...). This can be inefficient for large lists as it creates a new list in each iteration, leading to quadratic complexity in the worst case (O(M*N) where M is the number of replacements and N is the list length). A more performant approach would be to build the new lists by appending segments, which would be closer to linear time complexity.

def _extend_tokens( input_ids: List[int], labels: Optional[List[int]], loss_scale: Optional[List[float]], replace_idx_list: List[int], get_new_tokens: Callable[[int], List[int]]) -> Tuple[List[int], Optional[List[int]], Optional[List[float]]]: if not replace_idx_list: return input_ids, labels, loss_scale new_input_ids = [] new_labels = [] if labels is not None else None new_loss_scale = [] if loss_scale is not None else None last_idx = 0 for i, idx in enumerate(replace_idx_list): new_tokens = get_new_tokens(i) new_input_ids.extend(input_ids[last_idx:idx]) if labels is not None: new_labels.extend(labels[last_idx:idx]) if loss_scale is not None: new_loss_scale.extend(loss_scale[last_idx:idx]) new_input_ids.extend(new_tokens) if labels is not None: new_labels.extend([-100] * len(new_tokens)) if loss_scale is not None: scale_val = loss_scale[idx] new_loss_scale.extend([scale_val] * len(new_tokens)) last_idx = idx + 1 new_input_ids.extend(input_ids[last_idx:]) if labels is not None: new_labels.extend(labels[last_idx:]) if loss_scale is not None: new_loss_scale.extend(loss_scale[last_idx:]) return new_input_ids, new_labels, new_loss_scale

gemini-code-assist · 2025-08-09T15:49:51Z

swift/llm/template/template/midashenglm.py

+        input_values = [b['input_values'] for b in batch if b.get('input_values') is not None]
+        audio_lengths = [b['audio_length'] for b in batch if b.get('audio_length') is not None]


This part of the code iterates over the batch list twice, once for input_values and once for audio_lengths. For large batches, this could be slightly inefficient. You can combine these into a single loop to improve performance.

Suggested change

input_values = [b['input_values'] for b in batch if b.get('input_values') is not None]

audio_lengths = [b['audio_length'] for b in batch if b.get('audio_length') is not None]

input_values = []

audio_lengths = []

for b in batch:

iv = b.get('input_values')

if iv is not None:

input_values.append(iv)

al = b.get('audio_length')

if al is not None:

audio_lengths.append(al)

JimmyMa99 and others added 15 commits August 4, 2025 17:33

support midashenglm

98f90bd

support midashenglm

3f56bb7

Update swift/llm/model/model/qwen.py

4b9b144

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Merge branch 'modelscope:main' into main

a457cf1

midasheng

5837b58

Fix flake8 E302 and E305 blank line issues

dc2c43e

Apply yapf formatting changes

a346f0f

Merge branch 'modelscope:main' into main

d218696

fp32

aadc0f9

Merge branch 'modelscope:main' into main

28a0952

midasheng-train-infer

2bddbf6

Merge branch 'main' of github.com:JimmyMa99/ms-swift

fca0ece

midasheng-train-infer-test

93f689e

[fix]remove requires

1823eaf

midashenglm

8de5cce

gemini-code-assist bot reviewed Aug 9, 2025

View reviewed changes

Jintao-Huang added 3 commits August 9, 2025 23:33

update

e64df5f

update

d9d2594

update

cbc57b7

Jintao-Huang approved these changes Aug 9, 2025

View reviewed changes

gemini-code-assist bot reviewed Aug 9, 2025

View reviewed changes

hjh0119 approved these changes Aug 9, 2025

View reviewed changes

Jintao-Huang added 2 commits August 10, 2025 00:05

fix

ca0b400

fix

b3eb211

Jintao-Huang merged commit 6dfdf67 into modelscope:main Aug 9, 2025
1 of 2 checks passed

Jintao-Huang mentioned this pull request Aug 11, 2025

support midashenglm #5241

Closed

4 tasks

JimmyMa99 mentioned this pull request Aug 12, 2025

支持finetuning吗 xiaomi-research/dasheng-lm#12

Closed

Jintao-Huang pushed a commit that referenced this pull request Aug 14, 2025

[model] support Midashenglm (#5325)

a33857a

		assistant_start = len(new_input_ids) // 2
		labels[assistant_start:] = new_input_ids[assistant_start:]

	\|[mispeech/midashenglm-7b](https://modelscope.cn/models/mispeech/midashenglm-7b)\|midashenglm\|midashenglm\|transformers>=4.50, soundfile, qwen_omni_utils, decord\|✘\|audio\|[mispeech/midashenglm-7b](https://huggingface.co/mispeech/midashenglm-7b)\|
	\|[mispeech/midashenglm-7b](https://modelscope.cn/models/mispeech/midashenglm-7b)\|midashenglm\|midashenglm\|transformers>=4.50, soundfile, decord\|✘\|audio\|[mispeech/midashenglm-7b](https://huggingface.co/mispeech/midashenglm-7b)\|

		model.transformer.registered_causal_mask = model.transformer.registered_causal_mask.cuda()
		logger.info('registered_causal_mask to cuda')

-        model.transformer.registered_causal_mask = model.transformer.registered_causal_mask.cuda()
-        logger.info('registered_causal_mask to cuda')
+        device = next(model.parameters()).device
+        model.transformer.registered_causal_mask = model.transformer.registered_causal_mask.to(device)
+        logger.info(f'registered_causal_mask to {device}')

	class QwenTemplateMeta(ChatmlTemplateMeta):
	class MiDashengLMTemplateMeta(ChatmlTemplateMeta):



		def test_midashenglm():
		pt_engine = PtEngine('midasheng/midashenglm-7b')

	pt_engine = PtEngine('midasheng/midashenglm-7b')
	pt_engine = PtEngine('mispeech/midashenglm-7b')

		input_values = [b['input_values'] for b in batch if b.get('input_values') is not None]
		audio_lengths = [b['audio_length'] for b in batch if b.get('audio_length') is not None]

-        input_values = [b['input_values'] for b in batch if b.get('input_values') is not None]
-        audio_lengths = [b['audio_length'] for b in batch if b.get('audio_length') is not None]
+        input_values = []
+        audio_lengths = []
+        for b in batch:
+            iv = b.get('input_values')
+            if iv is not None:
+                input_values.append(iv)
+                al = b.get('audio_length')
+                if al is not None:
+                    audio_lengths.append(al)

Conversation

JimmyMa99 commented Aug 9, 2025

PR type

PR information

Experiment results

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Aug 9, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 9, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 9, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 9, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 9, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 9, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 9, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 9, 2025

Choose a reason for hiding this comment

Uh oh!

Jintao-Huang commented Aug 9, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Aug 9, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 9, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants