
flash-attn 2.8.3
0
Flash Attention: Fast and Memory-Efficient Exact Attention
Contents
Flash Attention: Fast and Memory-Efficient Exact Attention
Stars: 22287, Watchers: 22287, Forks: 2389, Open Issues: 1059The Dao-AILab/flash-attention repo was created 3 years ago and the last code push was 13 hours ago.
The project is extremely popular with a mindblowing 22287 github stars!
How to Install flash-attn
You can install flash-attn using pip
pip install flash-attn
or add it to a project with poetry
poetry add flash-attn
Package Details
- Author
- Tri Dao
- License
- None
- Homepage
- https://github.com/Dao-AILab/flash-attention
- PyPi:
- https://pypi.org/project/flash-attn/
- GitHub Repo:
- https://github.com/Dao-AILab/flash-attention
Classifiers
Related Packages
Errors
A list of common flash-attn errors.
Code Examples
Here are some flash-attn code examples and snippets.
GitHub Issues
The flash-attn package has 1059 open issues on GitHub
- Do we have flash_attn-2.8.3 wheel with cu12 + torch2.9 for cp311 ?
- can't install flash-attn | Error: "metadata-generation-failed"
- Add shift scheduler for deterministic full‑mask FA3 bwd on Hopper (sm90)
- Avoiding Out of Memory Killer (OOM) during compilation under Linux
- Add loc info & Fix api changes for CuTeDSL 4.4
- [Cute, SM100, BWD] Refactor get_n_block_max_for_m_block into a method of BlockInfo
- BWD sm100 2cta
- [Cute, SM100] Fix comment in tmem_p_offset
- TypeError: VibeVoiceASRForConditionalGeneration.init() got an unexpected keyword argument 'dtype'
- branch jshah/sm100-varlen-bwd RuntimeError('NCCL Error 1: unhandled cuda error (run with NCCL_DEBUG=INFO for details)')
- Any plan to release flash-attn-cute package?
- Warn when ninja is missing
- Fix compute_block_sparsity import in benchmark_mask_mod
- [Cute][Testing] Protyping a fast test mode for Cute
- [Cute,Fwd,Sm100] support irregular qhead / kvhead ratios
pythonfix