Refactor Attention implementation for ViT-based models#36545
Refactor Attention implementation for ViT-based models#36545qubvel merged 17 commits intohuggingface:mainfrom
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
run-slow: vit, audio_spectrogram_transformer, deit, dinov2, dinov2_with_registers, dpt, ijepa, videomae, vit_mae, vit_msn, vitpose_backbone, vivit, yolos |
|
This comment contains run-slow, running the specified jobs: This comment contains run-slow, running the specified jobs: models: ['models/audio_spectrogram_transformer', 'models/deit', 'models/dinov2', 'models/dinov2_with_registers', 'models/dpt', 'models/ijepa', 'models/videomae', 'models/vit', 'models/vit_mae', 'models/vit_msn', 'models/vitpose_backbone', 'models/vivit', 'models/yolos'] |
|
run-slow: vit, audio_spectrogram_transformer, deit, dinov2, dinov2_with_registers, dpt, ijepa, videomae, vit_mae, vit_msn, vitpose_backbone, vivit, yolos |
|
This comment contains run-slow, running the specified jobs: This comment contains run-slow, running the specified jobs: models: ['models/audio_spectrogram_transformer', 'models/deit', 'models/dinov2', 'models/dinov2_with_registers', 'models/dpt', 'models/ijepa', 'models/videomae', 'models/vit', 'models/vit_mae', 'models/vit_msn', 'models/vitpose_backbone', 'models/vivit', 'models/yolos'] |
|
cc @Cyrilvallez for review if you have bandwidth 🤗 |
ArthurZucker
left a comment
There was a problem hiding this comment.
🧼 clean and perfect! Thanks a lot for working on this, quite tedious!
…6545) * Refactor vit attention * Refactor ViT-based models * 🚨🚨🚨 Fix prefix for DPT * Update params order * trigger tests * Fix Dinov2 attention * Fix DPT attention impl propagation for backbone config * Common test fix: config is modif. inplace - avoid it * view->reshape * Fixup * Fixup * Enable IJepa FA2 * Add FA2 in corresponding model docs
What does this PR do?
Updates the way attention implementation is chosen. Instead of defining separate classes we use functional approach and switch attention implementation on the fly with
congig._attn_implementaitonparam.The following model will have SDPA and FA2 support:
It also affects the following models:
Fixes:
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.