memory-efficient-attention-pytorch

Implementation of a memory efficient multi-head attention as proposed in the paper, "Self-attention Does Not Need O(n²) Memory"

Stars

342

Forks

32

Language

Python

Last Updated

May 01, 2024

Similar Repos