Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand S2S range and make pad trimming controllable #235

Merged
merged 3 commits into from
Mar 17, 2023

Conversation

alextrott16
Copy link
Contributor

This PR adds a couple improvements to the Mixture-of-Denoisers collator used for things like span corruption or UL2.

  • It adds functionality to the sequence corruption so that the extreme end of the allowable parameter range essentially creates Causal LM training examples.
  • It adds an allow_pad_trimming argument to control whether the collator is allowed to trim excess padding (this can speed things up but isn't always desired because variable sequence lengths can mess with memory). This trimming is now off by default.

Copy link
Contributor

@abhi-mosaic abhi-mosaic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Sorry for the delay in reviewing.

@abhi-mosaic abhi-mosaic merged commit 071af14 into main Mar 17, 2023
@abhi-mosaic abhi-mosaic deleted the alex/mod_collator_update branch March 17, 2023 22:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants