Contents

flash-attn 2.8.3

0

Flash Attention: Fast and Memory-Efficient Exact Attention

Flash Attention: Fast and Memory-Efficient Exact Attention

Stars: 22287, Watchers: 22287, Forks: 2389, Open Issues: 1059

The Dao-AILab/flash-attention repo was created 3 years ago and the last code push was 13 hours ago.
The project is extremely popular with a mindblowing 22287 github stars!

How to Install flash-attn

You can install flash-attn using pip

pip install flash-attn

or add it to a project with poetry

poetry add flash-attn

Package Details

Author
Tri Dao
License
None
Homepage
https://github.com/Dao-AILab/flash-attention
PyPi:
https://pypi.org/project/flash-attn/
GitHub Repo:
https://github.com/Dao-AILab/flash-attention

Classifiers

No  flash-attn  pypi packages just yet.

Errors

A list of common flash-attn errors.

Code Examples

Here are some flash-attn code examples and snippets.

GitHub Issues

The flash-attn package has 1059 open issues on GitHub

  • Do we have flash_attn-2.8.3 wheel with cu12 + torch2.9 for cp311 ?
  • can't install flash-attn | Error: "metadata-generation-failed"
  • Add shift scheduler for deterministic full‑mask FA3 bwd on Hopper (sm90)
  • Avoiding Out of Memory Killer (OOM) during compilation under Linux
  • Add loc info & Fix api changes for CuTeDSL 4.4
  • [Cute, SM100, BWD] Refactor get_n_block_max_for_m_block into a method of BlockInfo
  • BWD sm100 2cta
  • [Cute, SM100] Fix comment in tmem_p_offset
  • TypeError: VibeVoiceASRForConditionalGeneration.init() got an unexpected keyword argument 'dtype'
  • branch jshah/sm100-varlen-bwd RuntimeError('NCCL Error 1: unhandled cuda error (run with NCCL_DEBUG=INFO for details)')
  • Any plan to release flash-attn-cute package?
  • Warn when ninja is missing
  • Fix compute_block_sparsity import in benchmark_mask_mod
  • [Cute][Testing] Protyping a fast test mode for Cute
  • [Cute,Fwd,Sm100] support irregular qhead / kvhead ratios

See more issues on GitHub