title | emoji | colorFrom | colorTo | sdk | sdk_version | app_file | pinned | license |
---|---|---|---|---|---|---|---|---|
Lora Cerebras Gpt2.7b Alpaca Shortprompt |
🐨 |
yellow |
pink |
gradio |
3.23.0 |
app.py |
false |
apache-2.0 |
Scripts to finetune Cerebras GPT2.7B on the Alpaca dataset, as well as inference demos.
- It is the fastest model in the west!
- The model with LoRA weights merged-in available at HuggingFace/lxe/Cerebras-GPT-2.7B-Alpaca-SP
- The LoRA weights also available at HuggingFace/lxe/lora-cerebras-gpt2.7b-alpaca-shortprompt
- ggml version of the model available at HuggingFace/lxe/ggml-cerebras-gpt2.7b-alpaca-shortprompt. You can run this without a GPU and it's much faster than the original model
The model tends to be pretty coherent, but it also hallucinates a lot of factually incorrect responses. Avoid using it for anything requiring factual correctness.
-
Be on a machine with an NVIDIA card with 12-24 GB of VRAM.
-
Get the environment ready
conda create -n cerberas-lora python=3.10
conda activate cerberas-lora
conda install -y cuda -c nvidia/label/cuda-11.7.0
conda install -y pytorch=1.13.1 pytorch-cuda=11.7 -c pytorch
- Clone the repo and install requirements
git clone https://github.com/lxe/cerebras-lora-alpaca.git && cd !!
pip install -r requirements.txt
- Run the inference demo
python app.py
To reproduce the finetuning results, do the following:
- Install jupyter and run it
pip install jupyter
jupyter notebook
-
Navigate to the
inference.ipynb
notebook and test out the inference demo. -
Navigate to the
finetune.ipynb
notebook and reproduce the finetuning results.
- It takes about 5 hours with the default settings
- Adjust the batch size and gradient accumulation steps to fit your GPU
Apache 2.0