Llama3Ops: From LoRa to Deployment with Llama3

flowchart TD;
    PretrainedLlama3[Pretrained Llama3]-->LLaMAFactory["LLaMA-Factory (FT)"];
    PretrainedLlama3-->PEFT["PEFT (FT)"];
    PretrainedLlama3-->Unsloth["Unsloth (FT)"];
    LLaMAFactory-->llamacpp-Q;
    LLaMAFactory-->AutoAWQ["AutoAWQ (Q)"];
    LLaMAFactory-->vLLM["vLLM (D)"];
    LLaMAFactory-->TensorRT-LLM["TensorRT-LLM (D)"];
    LLaMAFactory-->AutoGPTQ["AutoGPTQ (Q)"];
    llamacpp-Q["llama.cpp (Q)"]-->llamacpp-D["llama.cpp (D)"];
    llamacpp-Q-->ollama["ollama (D)"];
    llamacpp-D-->LangChain-RAG["LangChain (RAG)"];
    llamacpp-D-->LangChain-Agent["LangChain (Agent)"];
    llamacpp-D-->LlamaIndex["LlamaIndex (RAG)"];

Note

FT: Fine-tuning, Q: Quantization, D: Deployment.

Fine-tuning

LLaMA-Factory

Specify OUTPUT_DIR and EXPORT_DIR when executing the script; default values are ./Meta-Llama-3-8B-Instruct-Adapter and ./Meta-Llama-3-8B-Instruct-zh-10k.
```
$ source ./finetune_llama-factory_lora.sh
```

Quantization

llama.cpp

Example using ./Meta-Llama-3-8B-Instruct-zh-10k:
```
$ source ./quantize_llama.cpp.sh
```

AutoAWQ

Adjust the quantization settings as needed.

$ python3 quantize_autoawq.py \
    --pretrained_model_dir /path/to/your-pretrain-model-dir \
    --quantized_model_dir /path/to/your-quantized_model_dir

AutoGPTQ

Modify the quantization settings and examples according to your requirements.

$ python3 quantize_autogptq.py \
    --pretrained_model_dir /path/to/your-pretrain-model-dir \
    --quantized_model_dir /path/to/your-quantized_model_dir

Deployment

llama.cpp

Assuming the GGUF file path is ./Meta-Llama-3-8B-Instruct-zh-10k/meta-llama-3-8b-instruct-zh-10k.Q8_0.gguf:

Deploy via command line:
```
$ source ./deploy_llama.cpp_cli.sh
```
Or deploy using Docker (untested):
```
$ source ./deploy_llama.cpp_docker.sh
```
Test the deployment:
```
$ source ./deploy_llama.cpp_test.sh
```
ollama

Preparation for deployment:
```
$ source ./deploy_ollama_prepare.sh
```
For initial deployment of a custom model:
```
$ source ./deploy_ollama_create.sh
```
This step involves configuring the Modelfile. An example is provided for guidance, which you can customize as needed.

Host the LLM locally:
```
$ source ./deploy_ollama.sh
```
Single turn chat test:
```
$ source ./deploy_ollama_test_chat.sh
```
Multi-turn chat test:
```
$ python3 deploy_ollama_test_chat-multi-turn.py
```
Note: We use the OpenAI function call format in this .py file to interact with our model.

Start the server first for sequential conversations:
```
$ source ./deploy_ollama_server.sh
```
Then run:
```
$ source ./deploy_ollama.sh
```
Subsequent steps remain the same.

What will be available soon:

Fine-tuning:
- PEFT
- Unsloth
Quantization: N / A
Deployment:
- TensorRT-LLM & Triton
- vLLM
RAG:
- LangChain
- LlamaIndex

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Llama3Ops: From LoRa to Deployment with Llama3

Fine-tuning

Quantization

Deployment

What will be available soon:

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
LICENSE		LICENSE
Modelfile		Modelfile
README.md		README.md
deploy_llama.cpp_cli.sh		deploy_llama.cpp_cli.sh
deploy_llama.cpp_docker.sh		deploy_llama.cpp_docker.sh
deploy_llama.cpp_test.sh		deploy_llama.cpp_test.sh
deploy_ollama.sh		deploy_ollama.sh
deploy_ollama_create.sh		deploy_ollama_create.sh
deploy_ollama_prepare.sh		deploy_ollama_prepare.sh
deploy_ollama_server.sh		deploy_ollama_server.sh
deploy_ollama_test_chat-multi-turn.py		deploy_ollama_test_chat-multi-turn.py
deploy_ollama_test_chat.sh		deploy_ollama_test_chat.sh
finetune_llama-factory_lora.sh		finetune_llama-factory_lora.sh
finetune_llama-factory_lora_download_models.py		finetune_llama-factory_lora_download_models.py
finetune_llama-factory_lora_get_model_dir.py		finetune_llama-factory_lora_get_model_dir.py
finetune_llama-factory_lora_inference.py		finetune_llama-factory_lora_inference.py
quantize_autoawq.py		quantize_autoawq.py
quantize_autogptq.py		quantize_autogptq.py
quantize_llama.cpp.sh		quantize_llama.cpp.sh

License

XavierSpycy/llama-ops

Folders and files

Latest commit

History

Repository files navigation

Llama3Ops: From LoRa to Deployment with Llama3

Fine-tuning

Quantization

Deployment

What will be available soon:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages