Skip to content

Latest commit

 

History

History
63 lines (35 loc) · 2.94 KB

DEPLOY_MOBILE_GUIDE.md

File metadata and controls

63 lines (35 loc) · 2.94 KB

Deploying LLM to Mobile

To deploy LLM to mobile, we need a quantized model in GGUF format for CPU inference.

Selected Model

Local deployment steps on ios

llama.cpp has developed a barebone ios app using swiftui. You can find the example here. You can also find a lengthy discussion on Performance of llama.cpp on Apple Silicon A-series where some models have been benchmarked and also instruction is given how you can benchmark as well.

The general steps to follow to run the app in simulator:

Clone the repo

git clone https://github.com/ggerganov/llama.cpp

Download xcode in mac

Download xcode in mac from app store. Don't forget to install ios stimulator as well. You will be prompted to install during xcode installation.

Open the examples/llama.swiftui with Xcode

In your terminal where you have cloned the repo, type

cd llama.cpp/examples/llama.swiftui
xed .

This will open llama.swiftui in xcode.

Select the simulator

Select the ios simulator from top (iphone 15 pro max in my case)

Click on Run

Click on run icon to build and start the ios simulator.

Tested models

cosmo3769/starcoderbase-1b-GGUF

I have downloaded and loaded bigcode/starcoderbase-1b in GGUF format which I have quantized. Here is the download link for the GGUF format cosmo3769/starcoderbase-1b-GGUF.

Simulator Screenshot - iPhone 15 Pro Max - 2024-03-14 at 15 34 01

Simulator Screenshot - iPhone 15 Pro Max - 2024-03-14 at 15 35 33

Simulator Screenshot - iPhone 15 Pro Max - 2024-03-14 at 15 36 17

cosmo3769/starcoderbase-3b-GGUF

I have downloaded and loaded bigcode/starcoderbase-3b in GGUF format which I have quantized. Here is the download link for the GGUF format cosmo3769/starcoderbase-3b-GGUF.

Simulator Screenshot - iPhone 15 Pro Max - 2024-03-15 at 13 09 27

Simulator Screenshot - iPhone 15 Pro Max - 2024-03-15 at 13 16 46