Skip to content

Allow installation/use without flash attention dependency#3

Merged
pengzhangzhi merged 4 commits intopengzhangzhi:mainfrom
ANaka:main
Dec 3, 2024
Merged

Allow installation/use without flash attention dependency#3
pengzhangzhi merged 4 commits intopengzhangzhi:mainfrom
ANaka:main

Conversation

@ANaka
Copy link
Contributor

@ANaka ANaka commented Dec 2, 2024

Nice repo! Thanks for sharing.

Apologies in advance if I misunderstood something, but I wanted to use faesm with just the SDPA upgrade and found that I was unable to install or run it without flash-attn.

I made a couple of small changes as a workaround:

  • Change setup.py to make flash-attn an optional dependency
  • Make imports from the rotary and utils modules conditional on having flash-attn installed

@pengzhangzhi
Copy link
Owner

Nice catch! Did u run the example code on the readme? Did it pass? Would love to merge if the code runs!

Thank you : )

@ANaka
Copy link
Contributor Author

ANaka commented Dec 2, 2024

Nice catch! Did u run the example code on the readme? Did it pass?

I was able to install and run the example code after making these changes and it seemed to work fine. I haven't tried running your tests yet but would be happy to do that, will report back!

@pengzhangzhi
Copy link
Owner

That's amazing! thanks Alex!
It would be great to run the benchmark test, which will generate an efficiency comparison figure. Would like see how SDPA vs. official ESM.

Cheers,
Fred

@ANaka
Copy link
Contributor Author

ANaka commented Dec 2, 2024

Looks like test_compare_esm.py::test_esm_vs_faesm_numeric and benchmark.py::test_esm_vs_faesm_benchmark both pass if I set use_fa=False so hopefully everything is in order.

image

@pengzhangzhi
Copy link
Owner

Beautiful! I think we can still get ~30% reduction if we use the pytorch SDPA. Not bad :)

@pengzhangzhi pengzhangzhi merged commit 79ff32d into pengzhangzhi:main Dec 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants