This sample shows how to implement a llama-based model with OpenVINO runtime.
Please notice this repository is only for a functional test, and you can try to quantize the model to further optimize the performance of it
-
Install the requirements:
$pip install -r requirements.txt
-
Export the ONNX model from HuggingFace pipeline:
$python export.py -m huggingface_model_path -o onnx_model_path
For example: python export.py -m "xxx/llama-7b-hf" -o "./llama.onnx"
please follow the Licence on HuggingFace and get the approval from Meta before downloading llama checkpoints
-
Convert ONNX model to OpenVINO IR in FP16:
$mo -m onnx_model_path --compress_to_fp16
-
Run restructured pipeline:
$python generate.py -m openvino_model_path -t tokenizer_path -p prompt_sentence