SPARKNLP-765: VisionEncoderDecoder #13997
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR introduces the VisionEncoderDecoder annotator. This annotator takes images and produces captions.
Pretrained model uploaded at #13999
Beam Search Fix
This PR also includes a potential bug fix to our implementation of the beam search algorithm. Explanation:
In this line we initialize the beams with logprob 0 or -1e-9=-0.000000001 (equivalent probabilty$\exp(\mathrm{-1e-9}) \approx 1$ ), depending on its position
https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/main/scala/com/johnsnowlabs/ml/ai/util/Generation/Generate.scala#L200
This differs from the transformers implementation though, where the other logprob is initialized to be -1e+9=-1000000000 (equivalent probabilty$\exp(\mathrm{-1e9}) \approx 0$ ).
https://github.com/huggingface/transformers/blob/v4.33.1/src/transformers/generation/tf_utils.py#L2272
So basically, in our version we add a tiny amount, which results in the almost exact scores for the initial beams. Implementing this change results in the same results for this model as in the transformers implementation.
@maziyarpanahi, @prabod is taking a look if this will affect the Bart annotator. Results are different but I am not sure how else it is affected.
How Has This Been Tested?
Local tests and new test passing. Tested it also in a colab notebook.
Screenshots (if appropriate):
Types of changes
Checklist: