Python v0.2.1 - OpenAI Tiktoken Support
What's Changed
- Support Open AI Tiktoken tokenizers. So you can now give an OpenAI model name to tokenize the text for when calculating chunk sizes. by @benbrandt in #23
from semantic_text_splitter import TiktokenTextSplitter
# Maximum number of tokens in a chunk
max_tokens = 1000
# Optionally can also have the splitter not trim whitespace for you
splitter = TiktokenTextSplitter("gpt-3.5-turbo", trim_chunks=False)
chunks = splitter.chunks("your document text", max_tokens)
Full Changelog: python-v0.2.0...python-v0.2.1