Feature request: ability to tokenize a list of strings _or_ keep the tokenizer warm #1763

lsorber · 2024-09-25T20:21:59Z

Situation: let's say you have a list of sentences you want to tokenize.

Current workaround:

from llama_cpp import Llama

embedder = Llama.from_pretrained(repo_id="lm-kit/bge-m3-gguf", filename="*F16.gguf", embedding=True)
sentences = ["Hello world"] * 1000
sentences_tokens = [embedder.tokenize(sentence.encode()) for sentence in sentences] 
# ↑ Each tokenize call has an overhead of 200-400ms

Problem: each call to tokenize appears to have an overhead of about 200-400ms. Which means tokenizing 1000 sentences will take 200-400 seconds 💥. Even tokenizing 10 sentences can take 4 seconds!

Feature request: either add the ability to tokenize a list of strings efficiently with tokenize, or add the ability to keep the tokenizer warm so that subsequent calls are not as slow as the first call.

The text was updated successfully, but these errors were encountered:

abetlen added the enhancement New feature or request label Sep 25, 2024

lsorber mentioned this issue Oct 8, 2024

Make token counting faster and more robust superlinear-ai/raglite#23

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: ability to tokenize a list of strings _or_ keep the tokenizer warm #1763

Feature request: ability to tokenize a list of strings _or_ keep the tokenizer warm #1763

lsorber commented Sep 25, 2024 •

edited

Loading

Feature request: ability to tokenize a list of strings _or_ keep the tokenizer warm #1763

Feature request: ability to tokenize a list of strings _or_ keep the tokenizer warm #1763

Comments

lsorber commented Sep 25, 2024 • edited Loading

lsorber commented Sep 25, 2024 •

edited

Loading