This project loads the LLama2 model into the browser using wasm technology, achieving efficient inference computing on the web. demo online
screenshot:
The default model is a mini version of llama2 with 15M parameters trained over 300M Chinese dialogs.
Adapted from Karpathy's repository.
(will be opennd soon after cleaning the codes)
(will be opennd soon after cleaning the codes)
-
install node packages
npm install
-
run the web server
npm run start
-
open the web link: http://localhost:3000/
-
Special thanks to Karpathy for open-sourcing the llama2 code in C lang. The model training and inference codes are mainly adapted from this repository.
-
Special Thanks to enscripten, mui, and chatGPT
In lots of specific scenarios, portable models have advantages in terms of computational power savings, response acceleration, and data privacy. If you are interested in portable technology of LLM too, feel free to contact me ~
wx: