Add cross-attention to output hypotheses by mgaido91 · Pull Request #15229 · NVIDIA-NeMo/NeMo

mgaido91 · 2025-12-24T14:07:28Z

Important

The Update branch button must only be pressed in very rare occassions.
An outdated branch is never blocking the merge of a PR.
Please reach out to the automation team before pressing that button.

What does this PR do ?

The PR adds the encoder-decoder cross-attention to the output hypotheses returned by ASR models.

Collection: ASR

Changelog

Returns the cross-attention scores in the output of the greedy generator
Returns the cross-attention scores in the output of the beam search generator

Usage

You can potentially add a usage example below

from nemo.collections.asr.models import ASRModel
from nemo.collections.asr.models.aed_multitask_models import MultiTaskTranscriptionConfig
from nemo.collections.asr.parts.submodules.multitask_decoding import MultiTaskDecodingConfig

model = ASRModel.from_pretrained(model_name="nvidia/canary-1b-v2")
decoding_config = MultiTaskDecodingConfig()
decoding_config.return_xattn_scores = True
model.change_decoding_strategy(decoding_config)
config = MultiTaskTranscriptionConfig(
    batch_size=4,
    return_hypotheses=True,
    num_workers=0,
    verbose=False,
    prompt={'source_lang': 'en', 'target_lang': 'en'},
    enable_chunking=False
)
output = model.transcribe("/Users/mgaido/Downloads/vp-test/aa.wav", override_config=config)
assert output[0].xatt_scores is not None

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

New Feature
Bugfix
Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

@nithinraok @andrusenkoau

Additional Information

Related to Encoder-decoder attention extraction in ASR transcribe #14961 .

Signed-off-by: Marco Gaido <mgaido@fbk.eu>

…_to_output_hypo

nithinraok

Thanks Marco. great work. Added comments. Also,
Could you add an option something like preserve_xattn_scores, so when enabled through

decoding_cfg = MultiTaskDecodingConfig(
    strategy="beam",  # or "greedy"
    preserve_xattn_scores=True,
)

only store and return xattn_scores (to save memory by default)

nithinraok · 2026-01-02T19:21:48Z

nemo/collections/asr/parts/utils/rnnt_utils.py


    last_frame (Optional): Index of the last decoding step hypothesis was updated including blank token prediction.
+
+    xatt_scores (Optional): List of cross-attention scores for each decoder layer. Each element of the list


Shouldn;t shape is List[BxHxT1xT2] . Also best to add: this is used with AED models

this is for a single hypothesis, so there is no B... So this would be List[HxT1xT2]. If you prefer I can rename HxUxT to HxT1xT2. I will also add the indication about AED models.

nithinraok · 2026-01-02T19:22:54Z

nemo/collections/asr/modules/transformer/transformer_generators.py

            )
+            if xatt_scores_list is not None:
+                for layer in range(len(xatt_scores_list)):
+                    xatt_scores_list[layer] = torch.cat((xatt_scores_list[layer], new_xatt_scores_list[layer]), dim=2)


what about condition when new_xattn_scores_list is None? cat would fail

if new_xattn_scores_list is None, xatt_scores_list will stay None, so we never enter in this if

I meant to ask for each step, but probably its fine.

I got it but if xatt_scores_list is not None, then new_xattn_scores_list must be not None, otherwise there is something weird happening (ie. attention scores are returned for some tokens and for some others are None in the same generation). I'd rather add an assert on new_xattn_scores_list, WDYT?

nithinraok · 2026-01-02T19:27:30Z

nemo/collections/asr/modules/transformer/transformer_generators.py

        pos=0,
        return_scores: bool = True,
    ):
        log_probs, decoder_mems_list, _ = super()._one_step_forward(


could you update here as well and also include in returns tuple

@mgaido91 could you update this one as well

nemo/collections/asr/modules/transformer/transformer_generators.py

mgaido91 · 2026-01-07T18:00:12Z

@nithinraok thanks for your review. I do have a question, though. You said to add a param in MultiTaskDecodingConfig, however IIUC we should rather focus on TranscribeConfig, which is what is set through the transcribe API. Changing the MultiTaskDecodingConfig might be tricky for a user. Also, there are some config tricks required in the TrascribeConfig (i.e. enable_chunking=False) to enable getting the attention. So if we add a param in MultiTaskDecodingConfig, it might be quite complex for a user to get the attention, as it would require changing two different configs. WDYT?

nithinraok · 2026-01-12T03:32:49Z

@nithinraok thanks for your review. I do have a question, though. You said to add a param in MultiTaskDecodingConfig, however IIUC we should rather focus on TranscribeConfig, which is what is set through the transcribe API. Changing the MultiTaskDecodingConfig might be tricky for a user. Also, there are some config tricks required in the TrascribeConfig (i.e. enable_chunking=False) to enable getting the attention. So if we add a param in MultiTaskDecodingConfig, it might be quite complex for a user to get the attention, as it would require changing two different configs. WDYT?

Very good point.

Well, first, the option need to be passed to MultiTaskDecodingConfig for sure. To make it independent of the behavior of transcribe inference or not. But however, I get your point.
Two options:

If the option needs to be through TranscribeConfig then we need to run model.change_decoding_strategy(updated_cfg) to change the default decoding behavior (which is return xattn_scores=None) inside .transcribe() each time. Which I don;t think is a good idea.
2nd options is we create decoding cfg. like:

multitask_decoding = MultiTaskDecodingConfig()
multitask_decoding.strategy = "greedy"
multitask_decoding.return_xattn_scores = True

and call

asr_model.change_decoding_strategy(multitask_decoding)

before performing .transcribe()

Lets go with later option for now. I will keep thinking about this as this needs to be changed across. Are there any other options like these you are interested on to be changed through .transcribe()?

mgaido91 · 2026-01-12T11:07:42Z

Lets go with later option for now. I will keep thinking about this as this needs to be changed across. Are there any other options like these you are interested on to be changed through .transcribe()?

Ok, I will work on this in the next days. Maybe it would be worth adding some checks and logs to guide the user though. I will try to come up with a proposal for that while working on this. Thanks.

Signed-off-by: Marco Gaido <mgaido@fbk.eu>

Signed-off-by: mgaido91 <mgaido91@users.noreply.github.com> Signed-off-by: Marco Gaido <mgaido@fbk.eu>

Signed-off-by: Marco Gaido <mgaido@fbk.eu>

Signed-off-by: mgaido91 <mgaido91@users.noreply.github.com> Signed-off-by: Marco Gaido <mgaido@fbk.eu>

tests/collections/asr/mixins/test_transcription.py

Signed-off-by: Marco Gaido <mgaido@fbk.eu>

…_to_output_hypo

andrusenkoau

Hi @mgaido91, thank you for the great work! I have almost no questions. Let's wait your final changes with decoding config.

andrusenkoau · 2026-01-15T04:06:00Z

tests/collections/asr/decoding/test_multi_task_decoding.py


 def test_temperature_sampling_decoding(inputs, nnet):
-    gen = GreedySequenceGenerator(*nnet, temperature=10.0, n_samples=2)
+    gen = GreedySequenceGenerator(*nnet, return_xattn_scores=True, temperature=10.0, n_samples=2)


Could you add the check for both return_xattn_scores options (as above) here?

yes, sure, I did not do it to minimize the CI cost, I am updating it, thanks!

mgaido91 · 2026-01-15T09:01:50Z

Thanks @andrusenkoau !

Let's wait your final changes with decoding config.

I already made them. You can find the return_xattn_scores in the decoding config. I also added a warning message to instruct the users. Is there something missing that you have in mind?

PS I do not see why [CICD NeMo / Nemo_CICD_Test (pull_request)](https://github.com/NVIDIA-NeMo/NeMo/actions/runs/20991982507/job/60354816538?pr=15229) is failing, I cannot see meaningful logs, maybe I just do not know where to look at as I am not that familiar with NeMo CI. It is also weird as it passed before I updated the branch with the master. May you point me to the logs?

…_to_output_hypo

Signed-off-by: Marco Gaido <mgaido@fbk.eu>

mgaido91 · 2026-01-20T09:35:01Z

@nithinraok I think I addressed your comments, may you please take another look at this?

nithinraok

Thanks Marco. Minor comment which you might have missed earlier. LGTM otherwise. Thanks for the PR, great work!

mgaido91 · 2026-01-23T16:32:21Z

@nithinraok @andrusenkoau thank you for your guidance and reviews!

github-actions bot added the ASR label Dec 24, 2025

Add cross-attention to output hypotheses

21d5bb8

Signed-off-by: Marco Gaido <mgaido@fbk.eu>

mgaido91 force-pushed the add_attention_to_output_hypo branch from 2de6160 to 21d5bb8 Compare December 24, 2025 14:09

Merge branch 'main' of github.com:NVIDIA-NeMo/NeMo into add_attention…

bdca576

…_to_output_hypo

nithinraok added the Run CICD label Jan 2, 2026

nithinraok temporarily deployed to test January 2, 2026 14:19 — with GitHub Actions Inactive

nithinraok requested changes Jan 2, 2026

View reviewed changes

nithinraok requested a review from andrusenkoau January 2, 2026 20:01

github-actions bot added the community-request label Jan 7, 2026

chtruong814 added Run CICD and removed Run CICD labels Jan 13, 2026

mgaido91 and others added 4 commits January 13, 2026 17:44

update comment based on review

4f632bd

Signed-off-by: Marco Gaido <mgaido@fbk.eu>

Apply isort and black reformatting

8c8005f

Signed-off-by: mgaido91 <mgaido91@users.noreply.github.com> Signed-off-by: Marco Gaido <mgaido@fbk.eu>

add parameter to con trol whether to return xattn

3470328

Signed-off-by: Marco Gaido <mgaido@fbk.eu>

Apply isort and black reformatting

e06e0c4

Signed-off-by: mgaido91 <mgaido91@users.noreply.github.com> Signed-off-by: Marco Gaido <mgaido@fbk.eu>

mgaido91 force-pushed the add_attention_to_output_hypo branch from 2a1005f to e06e0c4 Compare January 13, 2026 16:44

chtruong814 added Run CICD and removed Run CICD labels Jan 13, 2026

chtruong814 temporarily deployed to test January 13, 2026 16:47 — with GitHub Actions Inactive

github-advanced-security bot found potential problems Jan 13, 2026

View reviewed changes

tests/collections/asr/mixins/test_transcription.py Fixed Show fixed Hide fixed

avoid useless import

0b5b12f

Signed-off-by: Marco Gaido <mgaido@fbk.eu>

chtruong814 added Run CICD and removed Run CICD labels Jan 13, 2026

Merge branch 'main' of github.com:NVIDIA-NeMo/NeMo into add_attention…

b72e67f

…_to_output_hypo

chtruong814 removed the Run CICD label Jan 13, 2026

chtruong814 added the Run CICD label Jan 13, 2026

Merge branch 'main' of github.com:NVIDIA-NeMo/NeMo into add_attention…

9e5e77a

…_to_output_hypo

chtruong814 added Run CICD needs-follow-up Issue needs follow-up and removed Run CICD labels Jan 14, 2026

chtruong814 temporarily deployed to test January 14, 2026 13:54 — with GitHub Actions Inactive

andrusenkoau reviewed Jan 15, 2026

View reviewed changes

mgaido91 added 2 commits January 15, 2026 10:02

Merge branch 'main' of github.com:NVIDIA-NeMo/NeMo into add_attention…

3a79319

…_to_output_hypo

update tests base on comment

6983fcc

Signed-off-by: Marco Gaido <mgaido@fbk.eu>

chtruong814 added Run CICD needs-follow-up Issue needs follow-up and removed Run CICD needs-follow-up Issue needs follow-up labels Jan 15, 2026

chtruong814 added needs-follow-up Issue needs follow-up and removed needs-follow-up Issue needs follow-up labels Jan 20, 2026

mgaido91 mentioned this pull request Jan 22, 2026

Canary-v2 with alignatt hlt-mt/simulstream#10

Closed

chtruong814 temporarily deployed to test January 22, 2026 23:46 — with GitHub Actions Inactive

nithinraok approved these changes Jan 22, 2026

View reviewed changes

Merge branch 'main' into add_attention_to_output_hypo

523bd22

chtruong814 added Run CICD and removed Run CICD labels Jan 22, 2026

chtruong814 temporarily deployed to test January 22, 2026 23:56 — with GitHub Actions Inactive

nithinraok merged commit f7c6172 into NVIDIA-NeMo:main Jan 23, 2026
259 checks passed

chtruong814 removed the needs-follow-up Issue needs follow-up label Jan 23, 2026

azziko mentioned this pull request Feb 23, 2026

Question about cross-attention decoder_input #15419

Open


		last_frame (Optional): Index of the last decoding step hypothesis was updated including blank token prediction.

		xatt_scores (Optional): List of cross-attention scores for each decoder layer. Each element of the list

Conversation

mgaido91 commented Dec 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Changelog

Usage

GitHub Actions CI

Before your PR is "Ready for review"

Who can review?

Additional Information

Uh oh!

nithinraok left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mgaido91 commented Jan 7, 2026

Uh oh!

nithinraok commented Jan 12, 2026

Uh oh!

mgaido91 commented Jan 12, 2026

Uh oh!

Uh oh!

andrusenkoau left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mgaido91 commented Jan 15, 2026

Uh oh!

mgaido91 commented Jan 20, 2026

Uh oh!

nithinraok left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mgaido91 commented Jan 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mgaido91 commented Dec 24, 2025 •

edited

Loading