-
Notifications
You must be signed in to change notification settings - Fork 26.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] DataCollatorForTextInfilling #12370
[WIP] DataCollatorForTextInfilling #12370
Conversation
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
It's still on my agenda to brush this up |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
This is a wonderful effort. any update on this? also if you can add TF call that would be great. |
@salrowili Sadly, I didn't find time for it. I'm also not sure whether this still fits with the library, there might have been some updates to the data collators in the meantime. I'm still interested in working on this but realistically I won't have time to do that unless I need it for an ongoing project. Would be up for a collaboration? |
@ionicsolutions Thanks for replying back. What about BartForConditionalGeneration? is it enough to train BART from scratch like in this example https://github.com/huggingface/transformers/blob/main/examples/flax/language-modeling/run_mlm_flax.py#L241 . However, as you can see it uses FlaxDataCollatorForLanguageModeling which i am not sure if it uses text in filling task? |
@salrowili I'm also interested in infilling generation and was wondering if you've made any progress? I see your last post was three weeks ago, so I'm wondering if maybe you found an alternative approach? |
@jbmaxwell I try out BART implementation of FLAX, XLA with TPU and Keras BART @ https://github.com/cosmoquester/transformers-bart-pretrain . Keras BART is my best model among those and hence that why i was looking for textinfliing. I think also the implementation of BART is not optimal with the hugging face library, especially for BART large. I am also working with fairseq now and torch xla and I think this will be the best among all variety that I tried out. I suggest for you ask for TPU access from google https://sites.research.google/trc/ and try out fairseq xla with BART but fix the dynamic shape by using pre-defined input shape in my frok https://github.com/salrowili/fairseq. You can see latest commits to see what changes I made. with TPUv3-8 and BART will get a speed of ~100k wps but you need to keep the log interval 10 and num_bucket=5. I run BART on my 3090 and it gives me a speed of 30K wps. 100k wps translate to ~20K steps/day which is slow compared to BERT with TF (~125K stepts/day) with batch size of 256 and max. seq. length of 512. which means it will take you around one month to finish 500K steps with BART (: |
I hadn't seen this before—thanks or the link! |
What does this PR do?
A DataCollator for the BART "Text Infilling" pre-training task.
The implementation borrows ideas from
fairseq
's more complex DenoisingDataset.Fixes #5428
(Addresses #5096)
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.