Whittle is a Python library for compressing large language models (LLMs) by extracting sub-networks to balance performance and efficiency. It is based on LitGPT and allows to compress many state-of-the-art models.
- Neural Architecture Search: Workflows for pre-training super-networks and multi-objective search to select sub-networks.
- Evaluation: Easy extraction of sub-networks checkpoint and evaluation using LM-Eval-Harness
- Efficiency: Different metrics to estimate efficiency of sub-networks, such as latency, FLOPs, or energy consumption.
Whittle supports and is tested for python 3.9 to 3.12.
You can install whittle with:
pip install whittle
Install whittle from source to get the most recent version:
git clone git@github.com:whittle-org/whittle.git
cd whittle
pip install -e .
To explore and understand different functionalities of whittle
checkout this colab notebook and examples/
We more than happy for any code contribution. If you are interested in contribution to whittle, please read our contribution guide.