Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better error messages when failing to load models with sentencepiece based tokenizers #971

Closed
Harsha-Nori opened this issue Jul 30, 2024 · 3 comments

Comments

@Harsha-Nori
Copy link
Collaborator

Just surfacing several discussions on an issue to track. Today, guidance throws a cryptic error if you try to install transformers and guidance without sentencepiece installed for models with sentencepiece-based tokenizers due to the path not being taken. We had some exception handling logic earlier, but it isn't robust enough to reliably recommend this to new users anymore.

pip install guidance transformers
from guidance import models, gen
lm = models.Transformers("microsoft/Phi-3-mini-4k-instruct", trust_remote_code=True)
File ~/miniconda3/envs/guidance/lib/python3.12/site-packages/guidance/models/transformers/_transformers.py:104, in TransformersTokenizer.__init__(self, model, transformers_tokenizer, chat_template, ignore_bos_token, **kwargs)
    102 if hasattr(transformers_tokenizer, "convert_tokens_to_string"):
    103     token_str = transformers_tokenizer.convert_tokens_to_string([token])
--> 104     roundtrip_id = transformers_tokenizer.encode(token_str)[0]
    105     if roundtrip_id == i:
    106         byte_coded = token_str.encode()

IndexError: list index out of range
@Harsha-Nori
Copy link
Collaborator Author

Oops, nevermind, captured well already here :): #958

@hudson-ai
Copy link
Collaborator

I'm suddenly getting this exception in CI despite having "fixed" it before. Seems a recent change impacted this. I'll open a PR so we can discuss how to best handle errors here.

@Saibo-creator
Copy link
Contributor

I encountered the same error, and it turned out that sentencepiece wasn't installed. Installing sentencepiece resolved the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants