Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPARKNLP-765: VisionEncoderDecoder #13997

Conversation

DevinTDHa
Copy link
Member

@DevinTDHa DevinTDHa commented Sep 20, 2023

Description

This PR introduces the VisionEncoderDecoder annotator. This annotator takes images and produces captions.

Pretrained model uploaded at #13999

Beam Search Fix

This PR also includes a potential bug fix to our implementation of the beam search algorithm. Explanation:

In this line we initialize the beams with logprob 0 or -1e-9=-0.000000001 (equivalent probabilty $\exp(\mathrm{-1e-9}) \approx 1$), depending on its position

https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/main/scala/com/johnsnowlabs/ml/ai/util/Generation/Generate.scala#L200

This differs from the transformers implementation though, where the other logprob is initialized to be -1e+9=-1000000000 (equivalent probabilty $\exp(\mathrm{-1e9}) \approx 0$).

https://github.com/huggingface/transformers/blob/v4.33.1/src/transformers/generation/tf_utils.py#L2272

So basically, in our version we add a tiny amount, which results in the almost exact scores for the initial beams. Implementing this change results in the same results for this model as in the transformers implementation.

@maziyarpanahi, @prabod is taking a look if this will affect the Bart annotator. Results are different but I am not sure how else it is affected.

How Has This Been Tested?

Local tests and new test passing. Tested it also in a colab notebook.

Screenshots (if appropriate):

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • Code improvements with no or little impact
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING page.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

@DevinTDHa DevinTDHa added new-feature Introducing a new feature DON'T MERGE Do not merge this PR labels Sep 20, 2023
@DevinTDHa DevinTDHa self-assigned this Sep 20, 2023
@DevinTDHa DevinTDHa force-pushed the feature/SPARKNLP-765-VisionEncoderDecoder branch from 3845f82 to f47c842 Compare September 20, 2023 10:33
Copy link
Contributor

@prabod prabod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bug. Thanks @DevinTDHa for identifying and fixing the bug.

@maziyarpanahi maziyarpanahi changed the base branch from master to release/512-release-candidate September 25, 2023 15:35
@maziyarpanahi maziyarpanahi merged commit da1070f into JohnSnowLabs:release/512-release-candidate Sep 25, 2023
maziyarpanahi added a commit that referenced this pull request Sep 25, 2023
* SPARKNLP-915, SPARKNLP-918:MPNet ONNX Example and missing Documentation (#13996)

* SPARKNLP-915: Fixed transformer version

* SPARKNLP-918: Missing Documentation
- missing annotators to home page
- missing doc strings
- export notebooks on transformers page

* SPARKNLP-765: VisionEncoderDecoder (#13997)

* Bump version & CHANGELOG [run doc]

* Update Scala and Python APIs

* [skip test] Add Example for VisionEncoderDecoder

* Release on Conda [skip test]

---------

Co-authored-by: Devin Ha <33089471+DevinTDHa@users.noreply.github.com>
Co-authored-by: github-actions <action@github.com>
Co-authored-by: Devin Ha <t.ha@tu-berlin.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DON'T MERGE Do not merge this PR new-feature Introducing a new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants