Skip to content

fix: pass attention_mask to T5/CLIP encoder in HFEmbedder#507

Open
Mr-Neutr0n wants to merge 1 commit intoblack-forest-labs:mainfrom
Mr-Neutr0n:fix/hfembedder-pass-attention-mask
Open

fix: pass attention_mask to T5/CLIP encoder in HFEmbedder#507
Mr-Neutr0n wants to merge 1 commit intoblack-forest-labs:mainfrom
Mr-Neutr0n:fix/hfembedder-pass-attention-mask

Conversation

@Mr-Neutr0n
Copy link

Summary

HFEmbedder.forward() tokenizes text with padding="max_length", which pads short sequences to max_length with padding tokens. However, it then passes attention_mask=None to the underlying T5/CLIP model.

For T5 in particular, this is problematic because T5 uses bidirectional (fully visible) self-attention. When attention_mask=None, the encoder treats every token — including padding tokens — as real input and attends to them. This means padding tokens actively participate in self-attention computations, polluting the text embeddings especially for short prompts that have many padding tokens.

The tokenizer already produces the correct attention_mask in batch_encoding, marking real tokens as 1 and padding tokens as 0. This PR simply passes that mask through to the model instead of discarding it.

Changes

  • src/flux/modules/conditioner.py: Replace attention_mask=None with attention_mask=batch_encoding["attention_mask"].to(self.hf_module.device) in HFEmbedder.forward()

Impact

  • Short prompts will produce cleaner text embeddings since padding tokens will be properly masked out
  • Long prompts that fill the full max_length are unaffected (no padding tokens to mask)
  • No change to model weights, architecture, or API

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant