-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Description
Hello @mgaido91,
Thank you for your work on #15229
I would like to use this together with alignatt, but I'm not sure how to pick the subset of [H, U, T]. I'm a bit thrown off by the mismatch in the xatt dimension sometimes:
Consider the following decoder_input_ids of len 14(mapped into text for a better illustration)
<|startofcontext|> A dream.<|startoftranscript|><|emo:undefined|><|en|><|en|><|pnc|><|noitn|><|notimestamp|><|nodiarize|>
And the output of len 3.
When I examine the hypothesis[0].xatt_scores.shape, I get in most of the cases [H, (len(decoder_input_ids) + len(output) + 1, T] = [H, 18, T]. Sometimes I get more than that. My question is, am I right to assume that the positions in the xatt_scores that correspond to the new output, can be expected in the range
[:, len(decoder_input_ids):len(decoder_input_ids)+len(output), :]And if I also wanted the context, I could find it starting from the same index it start in the decoder_input_ids?(2).
I'm doing inference with beam search, where beam=5