Tiktoken is BPE tokenizer from OpenAI used with their GPT models. This is a wrapper around it aimed primarily at enabling accurate counts of GPT model tokens used.
The expected process for installing / updating this gem is through Bundler:
bundle add rz_tiktoken_ruby
If you need to manually install the gem, you should use a native version for your machine to prevent the need for Rust to be installed on your system:
gem install rz_tiktoken_ruby --platform=arm-linux # (or x86_64-linux)
Usage should be very similar to the python library. Here's a simple example
Encode and decode text
require 'tiktoken_ruby'
enc = Tiktoken.get_encoding("cl100k_base")
enc.decode(enc.encode("hello world")) #=> "hello world"
Encoders can also be retrieved by model name
require 'tiktoken_ruby'
enc = Tiktoken.encoding_for_model("gpt-4")
enc.encode("hello world").length #=> 2
After checking out the repo, run bin/setup
to install dependencies. Then, run rake spec
to run the tests. You can also run bin/console
for an interactive prompt that will allow you to experiment.
To install this gem onto your local machine, run bundle exec rake install
. To release a new version, update the version number in version.rb
, and then run bundle exec rake release
, which will create a git tag for the version, push git commits and the created tag, and push the .gem
file to rubygems.org.
Bug reports and pull requests are welcome on GitHub at https://github.com/iapark/tiktoken_ruby.
To get started with development:
git clone https://github.com/IAPark/tiktoken_ruby.git
cd tiktoken_ruby
bundle install
bundle exec rake compile
bundle exec rake spec
The gem is available as open source under the terms of the MIT License.