srt-model-quantizing is a pipeline for downloading models from Hugging Face, quantizing them, and then uploading them to a Hugging Face-compatible repository. This project is developed by SolidRusT Networks and supports two quantization methods: Exllama2 and AutoAWQ.
- Supports both Exllama2 and AutoAWQ quantization methods
- Designed for simplicity and ease of use
- Supports both Nvidia CUDA and AMD ROCm GPUs
- Intended for use on Linux servers
srt-model-quantizing/
├── awq/ # AutoAWQ quantization implementation
│ ├── app/
│ ├── tests/
│ ├── requirements.txt
│ └── README.md
├── exl2/ # Exllama2 quantization implementation
│ ├── app/
│ ├── tests/
│ ├── templates/
│ ├── requirements.txt
│ └── README.md
└── README.md # This file
-
Clone the repository:
git clone https://github.com/SolidRusT/srt-model-quantizing.git cd srt-model-quantizing
-
Set up virtual environments for each quantization method:
For AWQ:
python -m venv awq_venv source awq_venv/bin/activate cd awq pip install -r requirements.txt
For Exllama2:
python -m venv exl2_venv source exl2_venv/bin/activate cd exl2 pip install -r requirements.txt
-
Set up your Hugging Face access token:
export HF_ACCESS_TOKEN=your_access_token_here
-
Activate the AWQ virtual environment:
source awq_venv/bin/activate
-
Navigate to the AWQ directory:
cd awq
-
Run the quantization:
python app/main.py <author> <model> [--quanter <quanter>]
Example:
python app/main.py cognitivecomputations/dolphin-2.9.4-gemma2-2b --quanter solidrust
-
Activate the Exllama2 virtual environment:
source exl2_venv/bin/activate
-
Navigate to the Exllama2 directory:
cd exl2
-
Run the quantization:
python app/main.py <author> <model> [--quanter <quanter>]
Example:
python app/main.py cognitivecomputations/dolphin-2.9.4-gemma2-2b --quanter solidrust
Both AWQ and Exllama2 implementations have their own config.py
files in their respective app
directories. You can modify these files to adjust various settings such as output directories, quantization parameters, and more.
To run tests for each implementation, navigate to the respective directory and run:
python -m unittest discover tests
Please refer to the CONTRIBUTING.md file for guidelines on how to contribute to this project.
This project is licensed under the MIT License. See the LICENSE file for details.