llama_openvino

This sample shows how to implement a llama-based model with OpenVINO runtime.

Please notice this repository is only for a functional test, and you can try to quantize the model to further optimize the performance of it

How to run it?

Install the requirements:

$pip install -r requirements.txt
Export the ONNX model from HuggingFace pipeline:

$python export.py -m huggingface_model_path -o onnx_model_path

For example: python export.py -m "xxx/llama-7b-hf" -o "./llama.onnx"

please follow the Licence on HuggingFace and get the approval from Meta before downloading llama checkpoints
Convert ONNX model to OpenVINO IR in FP16:

$mo -m onnx_model_path --compress_to_fp16
Run restructured pipeline:

$python generate.py -m openvino_model_path -t tokenizer_path -p prompt_sentence

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
export.py		export.py
generate.py		generate.py
requirements.txt		requirements.txt