Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Passing labels to text_decoder to compute loss. #65

Merged
merged 1 commit into from
Feb 23, 2024

Conversation

kapulkin
Copy link
Contributor

I noticed, that labels variable is not passed to text_decoder in VLMForCausalLM.forward(). So text_decoder will return just logits and will not compute loss. This makes impossible to use VLMForCausalLM model with transformer.Trainer and requires to write custom train loop or wrap VLMForCausalLM.

There is a fix to avoid that incompatibility.

@kimihailv
Copy link
Contributor

Passing labels to the text decoder is not enough. input embeds contain not only embeddings of text tokens, but also image features, so logits will also contain not only data for text but also for image

@ashvardanian ashvardanian changed the base branch from main to main-dev February 23, 2024 18:10
@ashvardanian ashvardanian merged commit f445a8b into unum-cloud:main-dev Feb 23, 2024
ashvardanian pushed a commit that referenced this pull request Feb 23, 2024
## [1.1.1](v1.1.0...v1.1.1) (2024-02-23)

### Docs

* Performance observations for M2 CPUs (#56) ([8374ef6](8374ef6)), closes [#56](#56)

### Fix

* Passing labels to `text_decoder` to compute loss. (#65) ([f445a8b](f445a8b)), closes [#65](#65)

### Improve

* Larger batch benchmarks ([fdc8587](fdc8587))

### Make

* pre-commit config and linters (#62) ([0a3efac](0a3efac)), closes [#62](#62)
@ashvardanian
Copy link
Contributor

🎉 This PR is included in version 1.1.1 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants