LLM-Distributed-Quantization

Accelerating multi-node Large Language Model training with per-layer selective quantization (FP32 -> FP16) of the transformer architecture.

Stars

2

Forks

0

Language

Python

Last Updated

Oct 07, 2022

Similar Repos