Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TF-OPT attention mask fixes #25238

Merged
merged 3 commits into from
Sep 6, 2023
Merged

TF-OPT attention mask fixes #25238

merged 3 commits into from
Sep 6, 2023

Conversation

Rocketknight1
Copy link
Member

With apologies for the delay, this PR should hopefully resolve the issues in #24637. @abb128 can you please try installing from this PR and verify if it resolves your issues? You can install from this PR with:

pip install --upgrade git+https://github.com/huggingface/transformers.git@tf_opt_fixes

Fixes #24637

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Aug 1, 2023

The documentation is not available anymore as the PR was closed or merged.

@Rocketknight1
Copy link
Member Author

No response, but we should probably merge anyway. Pinging @amyeroberts for core maintainer review!

Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this!

Just a question about the checks and values the inputs can have in _prepare_decoder_attention_mask

src/transformers/models/opt/modeling_tf_opt.py Outdated Show resolved Hide resolved
_, seq_length = input_shape
tf.debugging.assert_equal(
seq_length + past_key_values_length,
shape_list(attention_mask)[1],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this check robust? From the diff it looks like attention_mask can be None

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! The TFOPTDecoder layer checks for None attention masks and replaces them with tf.ones. That happens before _prepare_decoder_attention_mask is called. The earlier code had an if attention_mask is not None branch that was just always taken as a result.

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
@Rocketknight1
Copy link
Member Author

@amyeroberts Sorry for the delay, I lost track of this one!

Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing!

@Rocketknight1 Rocketknight1 merged commit 842e99f into main Sep 6, 2023
3 checks passed
@Rocketknight1 Rocketknight1 deleted the tf_opt_fixes branch September 6, 2023 12:37
parambharat pushed a commit to parambharat/transformers that referenced this pull request Sep 26, 2023
* stash commit

* More OPT updates

* Update src/transformers/models/opt/modeling_tf_opt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
blbadger pushed a commit to blbadger/transformers that referenced this pull request Nov 8, 2023
* stash commit

* More OPT updates

* Update src/transformers/models/opt/modeling_tf_opt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
EduardoPach pushed a commit to EduardoPach/transformers that referenced this pull request Nov 18, 2023
* stash commit

* More OPT updates

* Update src/transformers/models/opt/modeling_tf_opt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TFOPTForCausalLM Attention mask size mismatch exception
3 participants