源码编译构建

获取FlashAttention源码，从源码构建wheel并安装本地构建产物。

获取FlashAttention 2.8.3源码。

git clone --branch v2.8.3 --depth 1 https://github.com/Dao-AILab/flash-attention.git

确认PyTorch为CUDA版本。

cd flash-attention
python3 - <<'PY'
import torch  

print("torch_version=" + torch.__version__) 
print("torch_cuda=" + str(torch.version.cuda)) 
print("cuda_built=" + str(torch.backends.cuda.is_built())) 
print("cuda_available=" + str(torch.cuda.is_available())) 
assert torch.backends.cuda.is_built() 
assert torch.cuda.is_available() 
PY

从源码构建wheel。

export CUDA_HOME=/usr/local/cuda-13.0 
export PATH=$CUDA_HOME/bin:$PATH
export LD_LIBRARY_PATH=$CUDA_HOME/lib64:${LD_LIBRARY_PATH:-}
export FLASH_ATTENTION_FORCE_BUILD=TRUE 
export FLASH_ATTN_CUDA_ARCHS=80 
export MAX_JOBS=8 
export NVCC_THREADS=1  

python3 -m pip wheel --no-build-isolation --no-deps --wheel-dir dist .

安装本地构建产物。

python3 -m pip install --force-reinstall --no-deps dist/flash_attn-2.8.3-*.whl

构建产物示例。

flash_attn-2.8.3-cp311-cp311-linux_aarch64.whl

父主题： 开发指南