v0.9.0 #134
benbrandt
announced in
Announcements
v0.9.0
#134
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
What's New
More robust handling of Hugging Face tokenizers as chunk sizers.
Breaking Changes
There should only be breaking chunk output for those of you using a Hugging Face tokenizer with padding enabled. Because padding tokens are no longer counted, the chunks will likely be larger than before, and closer to the desired behavior.
Note: This will mean the generated chunks may also be larger than the chunk capacity when tokenized, because padding tokens will be added when you tokenize the chunk. The chunk capacity for these tokenizers reflects the number of tokens used in the text, not necessarily the number of tokens that the tokenizer will generate in total.
Full Changelog: v0.8.1...v0.9.0
This discussion was created from the release v0.9.0.
Beta Was this translation helpful? Give feedback.
All reactions