-
Notifications
You must be signed in to change notification settings - Fork 423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor bert and gpt #1130
refactor bert and gpt #1130
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re: sequence length warmup, Is this something we necessarily have to manually add or could it be inferred automagically from either the model or dataset?
model_inputs
could be automatically inferred from the model!
HuggingFaceModel doesn't do this. Will this break GLUE? is this general enough to torchmetrics that i should add it to the base HuggingFaceModel?
Yes!
I removed some of the validations on input args, mainly the one that prevents users from specifying a config and pretrained=True. Hugging Face does allow this behavior, they just load all the weights that are applicable and throw a warning. How do you feel about this?
If a user specifies a config, then there will likely not be a pretrained model to pull -- I wanted to preempt this problem by raising an exception, do you see what I mean? Thoughts on keeping the validation?
I know these tokenizers are being used for some error validation but its pretty odd to specify them multiple times + preprocessing really does belong with the dataset.
The trouble is that we need to make sure that the tokenizer's outputs are compatible with the model's inputs -- there is some level of codesign going on here. I agree that it is odd to use them multiple times, is it possible to centralize them somewhere?
I saw assert transformers.AutoModelForMaskedLM.from_pretrained is not None, "from_pretrained should not be None", is this being used for anything other than making pyright happy?
Not that I can remember!
With these changes, bert-base probably doesn't need multiple eval anymore. If that makes sense I can remove that from the yaml, thoughts?
Wait, why doesn't it need multiple evaluators anymore? I think I'm missing something.
Co-authored-by: Moin Nadeem <Moinnadeem@moinnadeem.com>
…to refactor-bert-gpt
Agreed, validation was put back!
Model inputs inferred from model type/pulled from tokenizer for validation; future PR will handle adding proper tokenization and training entrypoints |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ishanashastri Can you let me know when the notebooks have been refactored? Since 0.8 is getting cut today, we should be good to merge this soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Great work!
…to A-Jacobson-refactor-bert-gpt
I took a stab at converting
GPT2
andBERT
to useHuggingFaceModel
and deletingtransformer_shared
andtransformer_hparams
. I'd love for you @moinnadeem to check it out before I/or someone else tests them on the cluster as there are a few things that will probably explode despite the passing tests.The main changes are in
bert/model.py
andgpt/model.py
I converted the model creations to factory functions and moved all logic out of the hparams classes.I took some liberties that you may find.. questionable so I'm going to put them front and center so you don't have to dig. Please correct me if i'm very wrong in some of these assumptions.
Stuff that's probably going to break:
model_inputs
. I didn't implement that inHuggingFaceModel
because it seemed like it seemed like it was mostly for error checking and more related to the dataset being used. It's just not a great experience to have to manually specify these. Is this something we necessarily have to manually add or could it be inferred automagically from either the model or dataset?BERTModel
had these lines:HuggingFaceModel doesn't do this. Will this break GLUE? is this general enough to torchmetrics that i should add it to the base HuggingFaceModel?
Possible sus design choices:
pretrained=True
. Hugging Face does allow this behavior, they just load all the weights that are applicable and throw a warning. How do you feel about this?Other thoughts:
assert transformers.AutoModelForMaskedLM.from_pretrained is not None, "from_pretrained should not be None"
, is this being used for anything other than making pyright happy?bert-base
probably doesn't need multiple eval anymore. If that makes sense I can remove that from the yaml, thoughts?