FlexGen

Running large language models on a single GPU for throughput-oriented scenarios.

Stars

9025

Forks

527

Language

Python

Last Updated

May 15, 2024

Similar Repos