Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLaVA-NeXT-Video: fix generation with cache #32527

Closed
wants to merge 2 commits into from

Conversation

zucchini-nlp
Copy link
Member

What does this PR do?

Fixes generation for llava-next-video. Apparently started failing after we moved to cache class but some parts of the code were not modified. Checked all llava models, others are working since check is done on a different condition.

Yes, we can start using cache_position and rely on that, but we should note that cache_position for VLMs will not be correct and will contain positions only for text tokens. Adding support for cache position will come in the next PR, which is in progress. We;ll have to deprecate many things before we can get rid of the current checks to "merge or expand"

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Do you mind adding a fast test that would catch this? 🤗

@zucchini-nlp
Copy link
Member Author

Actually we should have VLM generation tests soon (not very soon), but for now maybe I'll try and make a very dummy test for all VLMs. It is annoying that fast tests for VLMs don't catch most basic bugs

@zucchini-nlp
Copy link
Member Author

Added one test for generation, I will see if we can start integrating test for VLMs before the refactoring is done. The last time it forces us to add many if/elses so we gave up until it is all standardized

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, any reason why the check is different from #32836

@zucchini-nlp
Copy link
Member Author

Nah, both are equally valid since current VLMs doesn't support speculative decoding. Let's use the PR from yesterday which has this and another fix, I'll close this one. The current state of main already has it fixed due to the recent refactor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants