-
Notifications
You must be signed in to change notification settings - Fork 27.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LLaVA-NeXT-Video: fix generation with cache #32527
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Do you mind adding a fast test that would catch this? 🤗
Actually we should have VLM generation tests soon (not very soon), but for now maybe I'll try and make a very dummy test for all VLMs. It is annoying that fast tests for VLMs don't catch most basic bugs |
Added one test for generation, I will see if we can start integrating test for VLMs before the refactoring is done. The last time it forces us to add many if/elses so we gave up until it is all standardized |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, any reason why the check is different from #32836
Nah, both are equally valid since current VLMs doesn't support speculative decoding. Let's use the PR from yesterday which has this and another fix, I'll close this one. The current state of main already has it fixed due to the recent refactor |
What does this PR do?
Fixes generation for llava-next-video. Apparently started failing after we moved to cache class but some parts of the code were not modified. Checked all llava models, others are working since check is done on a different condition.
Yes, we can start using
cache_position
and rely on that, but we should note thatcache_position
for VLMs will not be correct and will contain positions only for text tokens. Adding support for cache position will come in the next PR, which is in progress. We;ll have to deprecate many things before we can get rid of the current checks to "merge or expand"