SPARKNLP-765: VisionEncoderDecoder #13997

DevinTDHa · 2023-09-20T10:20:17Z

Description

This PR introduces the VisionEncoderDecoder annotator. This annotator takes images and produces captions.

Pretrained model uploaded at #13999

Beam Search Fix

This PR also includes a potential bug fix to our implementation of the beam search algorithm. Explanation:

In this line we initialize the beams with logprob 0 or -1e-9=-0.000000001 (equivalent probabilty $\exp(\mathrm{-1e-9}) \approx 1$), depending on its position

https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/main/scala/com/johnsnowlabs/ml/ai/util/Generation/Generate.scala#L200

This differs from the transformers implementation though, where the other logprob is initialized to be -1e+9=-1000000000 (equivalent probabilty $\exp(\mathrm{-1e9}) \approx 0$).

https://github.com/huggingface/transformers/blob/v4.33.1/src/transformers/generation/tf_utils.py#L2272

So basically, in our version we add a tiny amount, which results in the almost exact scores for the initial beams. Implementing this change results in the same results for this model as in the transformers implementation.

@maziyarpanahi, @prabod is taking a look if this will affect the Bart annotator. Results are different but I am not sure how else it is affected.

How Has This Been Tested?

Local tests and new test passing. Tested it also in a colab notebook.

Screenshots (if appropriate):

Types of changes

Bug fix (non-breaking change which fixes an issue)
Code improvements with no or little impact
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

My code follows the code style of this project.
My change requires a change to the documentation.
I have updated the documentation accordingly.
I have read the CONTRIBUTING page.
I have added tests to cover my changes.
All new and existing tests passed.

prabod

This is a bug. Thanks @DevinTDHa for identifying and fixing the bug.

* SPARKNLP-915, SPARKNLP-918:MPNet ONNX Example and missing Documentation (#13996) * SPARKNLP-915: Fixed transformer version * SPARKNLP-918: Missing Documentation - missing annotators to home page - missing doc strings - export notebooks on transformers page * SPARKNLP-765: VisionEncoderDecoder (#13997) * Bump version & CHANGELOG [run doc] * Update Scala and Python APIs * [skip test] Add Example for VisionEncoderDecoder * Release on Conda [skip test] --------- Co-authored-by: Devin Ha <33089471+DevinTDHa@users.noreply.github.com> Co-authored-by: github-actions <action@github.com> Co-authored-by: Devin Ha <t.ha@tu-berlin.de>

DevinTDHa added new-feature Introducing a new feature DON'T MERGE Do not merge this PR labels Sep 20, 2023

DevinTDHa requested review from maziyarpanahi and prabod September 20, 2023 10:20

DevinTDHa self-assigned this Sep 20, 2023

SPARKNLP-765: VisionEncoderDecoder

f47c842

DevinTDHa force-pushed the feature/SPARKNLP-765-VisionEncoderDecoder branch from 3845f82 to f47c842 Compare September 20, 2023 10:33

prabod approved these changes Sep 25, 2023

View reviewed changes

maziyarpanahi changed the base branch from master to release/512-release-candidate September 25, 2023 15:35

maziyarpanahi approved these changes Sep 25, 2023

View reviewed changes

maziyarpanahi merged commit da1070f into JohnSnowLabs:release/512-release-candidate Sep 25, 2023

maziyarpanahi mentioned this pull request Sep 25, 2023

Release/512 release candidate #14007

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SPARKNLP-765: VisionEncoderDecoder #13997

SPARKNLP-765: VisionEncoderDecoder #13997

DevinTDHa commented Sep 20, 2023 •

edited

Loading

prabod left a comment

SPARKNLP-765: VisionEncoderDecoder #13997

SPARKNLP-765: VisionEncoderDecoder #13997

Conversation

DevinTDHa commented Sep 20, 2023 • edited Loading

Description

Beam Search Fix

How Has This Been Tested?

Screenshots (if appropriate):

Types of changes

Checklist:

prabod left a comment

Choose a reason for hiding this comment

DevinTDHa commented Sep 20, 2023 •

edited

Loading