gpt2 model family, 2-4x faster compared to the prior release.
- gpt2 now uses KV caching for faster generation
- all models generate multiple tokens per second (to get the fastest speeds, see the instructions in SETUP.md)
- iOS 16+/macOS 13+ now required
gpt2-xl is split up into multiple files, per Github's restrictions. Download both parts and decompress them like so:
cat gpt2-xl.mlpackage.tar.gz.* | tar -xzvf -