An example of generating text with LLaMA using MLX.
LLaMA is a set of open source language models from Meta AI Research1 ranging from 7B to 65B parameters.
Install the dependencies:
pip install -r requirements.txt
Next, download and convert the model. If you do not have access to the model weights you will need to request access from Meta.
Alternatively, you can also download a select converted checkpoints from the mlx-llama community organisation on Hugging Face and skip the conversion step.
Convert the weights with:
python convert.py <path_to_torch_weights> mlx_llama_weights.npz
Once you've converted the weights to MLX format, you can interact with the LLaMA model:
python llama.py mlx_llama.npz tokenizer.model "hello"
Run python llama.py --help
for more details.
Footnotes
-
Refer to the arXiv paper and blog post for more details. ↩