You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Situation: let's say you have a list of sentences you want to tokenize.
Current workaround:
fromllama_cppimportLlamaembedder=Llama.from_pretrained(repo_id="lm-kit/bge-m3-gguf", filename="*F16.gguf", embedding=True)
sentences= ["Hello world"] *1000sentences_tokens= [embedder.tokenize(sentence.encode()) forsentenceinsentences]
# ↑ Each tokenize call has an overhead of 200-400ms
Problem: each call to tokenize appears to have an overhead of about 200-400ms. Which means tokenizing 1000 sentences will take 200-400 seconds 💥. Even tokenizing 10 sentences can take 4 seconds!
Feature request: either add the ability to tokenize a list of strings efficiently with tokenize, or add the ability to keep the tokenizer warm so that subsequent calls are not as slow as the first call.
The text was updated successfully, but these errors were encountered:
Situation: let's say you have a list of sentences you want to tokenize.
Current workaround:
Problem: each call to
tokenize
appears to have an overhead of about 200-400ms. Which means tokenizing 1000 sentences will take 200-400 seconds 💥. Even tokenizing 10 sentences can take 4 seconds!Feature request: either add the ability to tokenize a list of strings efficiently with
tokenize
, or add the ability to keep the tokenizer warm so that subsequent calls are not as slow as the first call.The text was updated successfully, but these errors were encountered: