Pragmatic framework for those who want something like GGML or ONNX in pure Go.
Aimed to be 100% compatible with GGML and GGUF, as well support ONNX and safetensors models.
- Update the older code for better GGML compatibility
- Implement new operations needed for LLaMA v3 inference
- Allow GGUF format models to be read
- LLaMA v3.1 will work as FP32 / FP16 model on modern platrofrms
- Support some popular quantizations like Q6K or Q80
Please check out my related project Booster