Skip to content

Request: Add Flash Attention 2.0 Support for ViTMAEForPreTraining #36527

@noelEOS

Description

@noelEOS

Hi Hugging Face team!

I am currently working on pre-training a Foundation Model using ViTMAEForPreTraining, and I was hoping to use Flash Attention 2.0 to speed up training and reduce memory usage. However, when I attempted to enable Flash Attention, I encountered the following error:

ValueError: ViTMAEForPreTraining does not support Flash Attention 2.0 yet. Please request to add support where the model is hosted, on its model hub page: https://huggingface.co//discussions/new or in the Transformers GitHub repo: https://github.com/huggingface/transformers/issues/new

Since MAE pre-training is heavily dependent on the attention mechanism, adding Flash Attention support would be a valuable enhancement—especially for larger ViT models and high-resolution datasets, like Landsat data we are working with.

Feature Request

  • Please add support for Flash Attention 2.0 to ViTMAEForPreTraining.
  • This would help make MAE pre-training more efficient in terms of speed and memory consumption.

Why This Matters

  • Many users working with large imagery datasets (like remote sensing, medical imaging, etc.) would greatly benefit from this.
  • Flash Attention has already proven useful in other ViT variants, so bringing this to MAE feels like a natural next step.

Environment Details

  • Transformers version: v4.41.0.dev0
  • PyTorch version: 2.5.1
  • Running on multi-GPU with NCCL backend

Metadata

Metadata

Assignees

No one assigned

    Labels

    Feature requestRequest for a new featureFlash AttentionGood Second IssueIssues that are more difficult to do than "Good First" issues - give it a try if you want!Vision

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions