Skip to content

Python v0.2.1 - OpenAI Tiktoken Support

Compare
Choose a tag to compare
@benbrandt benbrandt released this 13 Jun 19:59
· 544 commits to main since this release

What's Changed

  • Support Open AI Tiktoken tokenizers. So you can now give an OpenAI model name to tokenize the text for when calculating chunk sizes. by @benbrandt in #23
from semantic_text_splitter import TiktokenTextSplitter

# Maximum number of tokens in a chunk
max_tokens = 1000
# Optionally can also have the splitter not trim whitespace for you
splitter = TiktokenTextSplitter("gpt-3.5-turbo", trim_chunks=False)

chunks = splitter.chunks("your document text", max_tokens)

Full Changelog: python-v0.2.0...python-v0.2.1