-
Notifications
You must be signed in to change notification settings - Fork 32.6k
Description
Hi Hugging Face team!
I am currently working on pre-training a Foundation Model using ViTMAEForPreTraining, and I was hoping to use Flash Attention 2.0 to speed up training and reduce memory usage. However, when I attempted to enable Flash Attention, I encountered the following error:
ValueError: ViTMAEForPreTraining does not support Flash Attention 2.0 yet. Please request to add support where the model is hosted, on its model hub page: https://huggingface.co//discussions/new or in the Transformers GitHub repo: https://github.com/huggingface/transformers/issues/new
Since MAE pre-training is heavily dependent on the attention mechanism, adding Flash Attention support would be a valuable enhancement—especially for larger ViT models and high-resolution datasets, like Landsat data we are working with.
Feature Request
- Please add support for Flash Attention 2.0 to ViTMAEForPreTraining.
- This would help make MAE pre-training more efficient in terms of speed and memory consumption.
Why This Matters
- Many users working with large imagery datasets (like remote sensing, medical imaging, etc.) would greatly benefit from this.
- Flash Attention has already proven useful in other ViT variants, so bringing this to MAE feels like a natural next step.
Environment Details
- Transformers version: v4.41.0.dev0
- PyTorch version: 2.5.1
- Running on multi-GPU with NCCL backend