Skip to content

WIP: SALM with NeMo Automodel integration for Nemotron Nano V3 LLM backbone#15447

Draft
pzelasko wants to merge 49 commits intomainfrom
speechlm2-with-nemo-automodel-merge
Draft

WIP: SALM with NeMo Automodel integration for Nemotron Nano V3 LLM backbone#15447
pzelasko wants to merge 49 commits intomainfrom
speechlm2-with-nemo-automodel-merge

Conversation

@pzelasko
Copy link
Collaborator

Important

The Update branch button must only be pressed in very rare occassions.
An outdated branch is never blocking the merge of a PR.
Please reach out to the automation team before pressing that button.

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Collection: [Note which collection this PR will affect]

Changelog

  • Add specific line by line info of high level changes in this PR.

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this 

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

  • Related to # (issue)

pzelasko and others added 30 commits February 4, 2026 14:17
Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>
Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
…automodel's utility

Signed-off-by: Piotr Żelasko <petezor@gmail.com>
…full LLM

Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
pzelasko and others added 15 commits February 18, 2026 11:56
…converted models

Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Implements NemotronNanoV3PromptFormatter (NAME="nemotron-nano-v3") using
ChatML-style <|im_start|>/<|im_end|> template with encode_dialog override
that handles: auto-insert empty system turn, history thinking truncation,
<think></think> prepend for non-thinking assistant turns, and dynamic
inference prefix (thinking on/off). Includes Lhotse Cut integration via
registered_prompt_format_fn. Verified against HF apply_chat_template for
nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 (both string and token match).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
@pzelasko
Copy link
Collaborator Author

Trying to decide if we should make SALM backward compatible with vanilla transformers LLMs (shares lot of logic but gets somewhat complex) or copy this into a new class (cleaner but more duplication). In any case canary-qwen-2.5b released checkpoint must work with the final shape of this PR.

# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import json

Check notice

Code scanning / CodeQL

Unused import Note

Import of 'json' is not used.

Copilot Autofix

AI 16 days ago

In general, the correct way to fix an unused import in Python is to remove the import statement if the module is never referenced in the file. This reduces visual clutter, avoids implying unnecessary dependencies, and can slightly speed up module import time.

Here, the best fix is to delete the import json line in nemo/collections/common/data/lhotse/text_adapters.py (line 14 in the provided snippet), leaving the rest of the imports unchanged. No additional methods, definitions, or replacement imports are needed, since no code in the shown region uses json. This change preserves all existing functionality because it only removes an unused symbol.

Suggested changeset 1
nemo/collections/common/data/lhotse/text_adapters.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/nemo/collections/common/data/lhotse/text_adapters.py b/nemo/collections/common/data/lhotse/text_adapters.py
--- a/nemo/collections/common/data/lhotse/text_adapters.py
+++ b/nemo/collections/common/data/lhotse/text_adapters.py
@@ -11,7 +11,6 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
-import json
 import logging
 import math
 import random
EOF
@@ -11,7 +11,6 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import json
import logging
import math
import random
Copilot is powered by AI and may make mistakes. Always verify output.
Comment on lines +100 to +105
# for turn in turns:
# if turn["role"] == "user" or turn["role"] == "system":
# if "/think" in turn["slots"]["message"]:
# enable_thinking = True
# elif "/no_think" in turn["slots"]["message"]:
# enable_thinking = False

Check notice

Code scanning / CodeQL

Commented-out code Note

This comment appears to contain commented-out code.

Copilot Autofix

AI 16 days ago

In general, to fix commented-out code you either (a) reinstate it as active code because it is required, or (b) remove it (or convert it into concise explanatory comments) if the behavior is not in use. Here, the function already accepts an enable_thinking flag and the commented block redundantly recalculates it from the content of system/user turns; since this logic is disabled and the docstring describes enable_thinking as a parameter, the least disruptive fix is to remove the commented-out code while preserving the surrounding explanatory comments about step 1. Concretely, in nemo/collections/common/prompts/qwen.py, inside Qwen3PromptFormatter.encode_dialog, delete lines 99–105 that begin with # enable_thinking = True and the subsequent commented for turn in turns: loop. No new methods or imports are needed.

Suggested changeset 1
nemo/collections/common/prompts/qwen.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/nemo/collections/common/prompts/qwen.py b/nemo/collections/common/prompts/qwen.py
--- a/nemo/collections/common/prompts/qwen.py
+++ b/nemo/collections/common/prompts/qwen.py
@@ -96,13 +96,6 @@
 
         # 1) (Inference, Optional) Determine if thinking is enabled in user or system turns.
         # If multiple turns have the tag, we will use the last one.
-        # enable_thinking = True  # By default, it is enabled according to Qwen3 prompt format
-        # for turn in turns:
-        #     if turn["role"] == "user" or turn["role"] == "system":
-        #         if "/think" in turn["slots"]["message"]:
-        #             enable_thinking = True
-        #         elif "/no_think" in turn["slots"]["message"]:
-        #             enable_thinking = False
 
         # 2) (Training and Inference) Remove thinking content from previous turns.
         for turn in turns[:-1]:
EOF
@@ -96,13 +96,6 @@

# 1) (Inference, Optional) Determine if thinking is enabled in user or system turns.
# If multiple turns have the tag, we will use the last one.
# enable_thinking = True # By default, it is enabled according to Qwen3 prompt format
# for turn in turns:
# if turn["role"] == "user" or turn["role"] == "system":
# if "/think" in turn["slots"]["message"]:
# enable_thinking = True
# elif "/no_think" in turn["slots"]["message"]:
# enable_thinking = False

# 2) (Training and Inference) Remove thinking content from previous turns.
for turn in turns[:-1]:
Copilot is powered by AI and may make mistakes. Always verify output.
with loss_parallel():
super().backward(*args, **kwargs)

def configure_gradient_clipping(self, optimizer, gradient_clip_val, gradient_clip_algorithm=None):

Check notice

Code scanning / CodeQL

Explicit returns mixed with implicit (fall through) returns Note

Mixing implicit and explicit returns may indicate an error, as implicit returns always return None.

Copilot Autofix

AI 1 day ago

In general, to fix “explicit + implicit return” issues, ensure that all control-flow paths of a function use explicit return statements, and preferably return the same type (or at least make return None explicit where appropriate). Here, configure_gradient_clipping is a PyTorch Lightning hook whose return value is unused; the method is meant to perform side effects only. The best minimal fix is to add an explicit return None at the end of the method so that the FSDP-specific branch and the case with no parameters both end in an explicit return, while keeping the delegation to super().configure_gradient_clipping(...) unchanged.

Concretely, in nemo/collections/speechlm2/models/salm.py, within SALM.configure_gradient_clipping, after the if params: block, add return None (properly indented). No imports or additional definitions are needed, and no behavior changes: the method already implicitly returns None on that path.

Suggested changeset 1
nemo/collections/speechlm2/models/salm.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/nemo/collections/speechlm2/models/salm.py b/nemo/collections/speechlm2/models/salm.py
--- a/nemo/collections/speechlm2/models/salm.py
+++ b/nemo/collections/speechlm2/models/salm.py
@@ -340,6 +340,7 @@
         params = [p for group in optimizer.param_groups for p in group["params"] if p.grad is not None]
         if params:
             _clip_grad_norm_impl(params, max_norm=gradient_clip_val)
+        return None
 
     @torch.no_grad()
     def generate(
EOF
@@ -340,6 +340,7 @@
params = [p for group in optimizer.param_groups for p in group["params"] if p.grad is not None]
if params:
_clip_grad_norm_impl(params, max_norm=gradient_clip_val)
return None

@torch.no_grad()
def generate(
Copilot is powered by AI and may make mistakes. Always verify output.
import torch
from lhotse import CutSet, SupervisionSegment
from lhotse.testing.dummies import dummy_cut, dummy_recording
from omegaconf import DictConfig, OmegaConf

Check notice

Code scanning / CodeQL

Unused import Note test

Import of 'OmegaConf' is not used.

Copilot Autofix

AI 1 day ago

To fix an unused import, remove only the unused symbol while keeping any used ones. Here, DictConfig is used extensively, but OmegaConf is not. The best fix is to adjust the import statement on line 22 so it only imports DictConfig.

Concretely, in tests/collections/speechlm2/test_salm_automodel_lora.py, replace from omegaconf import DictConfig, OmegaConf with from omegaconf import DictConfig. No other code changes are required since nothing references OmegaConf.

Suggested changeset 1
tests/collections/speechlm2/test_salm_automodel_lora.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/tests/collections/speechlm2/test_salm_automodel_lora.py b/tests/collections/speechlm2/test_salm_automodel_lora.py
--- a/tests/collections/speechlm2/test_salm_automodel_lora.py
+++ b/tests/collections/speechlm2/test_salm_automodel_lora.py
@@ -19,7 +19,7 @@
 import torch
 from lhotse import CutSet, SupervisionSegment
 from lhotse.testing.dummies import dummy_cut, dummy_recording
-from omegaconf import DictConfig, OmegaConf
+from omegaconf import DictConfig
 
 from nemo.collections.common.data.lhotse import NeMoMultimodalConversation
 from nemo.collections.common.data.lhotse.text_adapters import AudioTurn, TextTurn
EOF
@@ -19,7 +19,7 @@
import torch
from lhotse import CutSet, SupervisionSegment
from lhotse.testing.dummies import dummy_cut, dummy_recording
from omegaconf import DictConfig, OmegaConf
from omegaconf import DictConfig

from nemo.collections.common.data.lhotse import NeMoMultimodalConversation
from nemo.collections.common.data.lhotse.text_adapters import AudioTurn, TextTurn
Copilot is powered by AI and may make mistakes. Always verify output.
Comment on lines +30 to +35
from nemo.collections.speechlm2.parts.automodel_lora import (
LORA_PARAM_PATTERN,
ensure_lora_trainable,
make_peft_config,
maybe_install_lora,
)

Check notice

Code scanning / CodeQL

Unused import Note test

Import of 'maybe_install_lora' is not used.

Copilot Autofix

AI 1 day ago

To fix an unused-import problem, the standard approach is to remove the specific symbol from the import statement (or remove the entire import if nothing from it is used). In this case, the import is a multi-name import from nemo.collections.speechlm2.parts.automodel_lora, and the only unused symbol is maybe_install_lora.

The best fix that does not change functionality is to edit the from nemo.collections.speechlm2.parts.automodel_lora import (...) block and delete maybe_install_lora, while leaving the other imported names (LORA_PARAM_PATTERN, ensure_lora_trainable, make_peft_config) untouched. This change should be made in tests/collections/speechlm2/test_salm_automodel_lora.py around lines 30–35 where the multi-line import is defined. No new methods, imports, or definitions are required.

Suggested changeset 1
tests/collections/speechlm2/test_salm_automodel_lora.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/tests/collections/speechlm2/test_salm_automodel_lora.py b/tests/collections/speechlm2/test_salm_automodel_lora.py
--- a/tests/collections/speechlm2/test_salm_automodel_lora.py
+++ b/tests/collections/speechlm2/test_salm_automodel_lora.py
@@ -31,7 +31,6 @@
     LORA_PARAM_PATTERN,
     ensure_lora_trainable,
     make_peft_config,
-    maybe_install_lora,
 )
 
 if torch.cuda.is_available():
EOF
@@ -31,7 +31,6 @@
LORA_PARAM_PATTERN,
ensure_lora_trainable,
make_peft_config,
maybe_install_lora,
)

if torch.cuda.is_available():
Copilot is powered by AI and may make mistakes. Always verify output.
@desh2608
Copy link

Trying to decide if we should make SALM backward compatible with vanilla transformers LLMs (shares lot of logic but gets somewhat complex) or copy this into a new class (cleaner but more duplication). In any case canary-qwen-2.5b released checkpoint must work with the final shape of this PR.

(copying my comment from Slack here) In the current PR, does it already work with HF Automodel and NeMo Automodel? If yes, it looks fine to me. Most of the complexity around model loading seems to be in configure_model() and some utility functions in pretrained.py . Other than that, the annoying thing is to have to put DTensor to a full tensor for some operations (I had to do the same for adding audio generation head), but I think it's not too bad.

pzelasko added 4 commits March 5, 2026 10:47
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants