Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor bert and gpt #1130

Merged
merged 26 commits into from
Jun 28, 2022
Merged

Conversation

A-Jacobson
Copy link
Contributor

@A-Jacobson A-Jacobson commented Jun 6, 2022

I took a stab at converting GPT2 and BERT to use HuggingFaceModel and deleting transformer_shared and transformer_hparams. I'd love for you @moinnadeem to check it out before I/or someone else tests them on the cluster as there are a few things that will probably explode despite the passing tests.

The main changes are in bert/model.py and gpt/model.py I converted the model creations to factory functions and moved all logic out of the hparams classes.

I took some liberties that you may find.. questionable so I'm going to put them front and center so you don't have to dig. Please correct me if i'm very wrong in some of these assumptions.

Stuff that's probably going to break:

  1. sequence length warmup - For some reason this isn't making my tests fail, but I can see that implementation relies on getting model_inputs. I didn't implement that in HuggingFaceModel because it seemed like it seemed like it was mostly for error checking and more related to the dataset being used. It's just not a great experience to have to manually specify these. Is this something we necessarily have to manually add or could it be inferred automagically from either the model or dataset?
  2. default torchmetrics? - Maybe? I saw that the original BERTModel had these lines:
# if we are in the single class case, then remove the classes dimension
        if output.shape[1] == 1:
            output = output.squeeze(dim=1)

HuggingFaceModel doesn't do this. Will this break GLUE? is this general enough to torchmetrics that i should add it to the base HuggingFaceModel?

Possible sus design choices:

  1. I removed some of the validations on input args, mainly the one that prevents users from specifying a config and pretrained=True. Hugging Face does allow this behavior, they just load all the weights that are applicable and throw a warning. How do you feel about this?
  2. Slimmed down the args these factories take, they no longer specify tokenizers and pretrained model names.I know these tokenizers are being used for some error validation but its pretty odd to specify them multiple times + preprocessing really does belong with the dataset.

Other thoughts:

  1. I saw assert transformers.AutoModelForMaskedLM.from_pretrained is not None, "from_pretrained should not be None", is this being used for anything other than making pyright happy?
  2. With these changes, bert-base probably doesn't need multiple eval anymore. If that makes sense I can remove that from the yaml, thoughts?
  • Addresses CO-386

@A-Jacobson A-Jacobson requested a review from moinnadeem June 6, 2022 22:59
@A-Jacobson A-Jacobson requested a review from eracah June 6, 2022 23:16
Copy link
Contributor

@moinnadeem moinnadeem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: sequence length warmup, Is this something we necessarily have to manually add or could it be inferred automagically from either the model or dataset?

model_inputs could be automatically inferred from the model!

HuggingFaceModel doesn't do this. Will this break GLUE? is this general enough to torchmetrics that i should add it to the base HuggingFaceModel?
Yes!

I removed some of the validations on input args, mainly the one that prevents users from specifying a config and pretrained=True. Hugging Face does allow this behavior, they just load all the weights that are applicable and throw a warning. How do you feel about this?

If a user specifies a config, then there will likely not be a pretrained model to pull -- I wanted to preempt this problem by raising an exception, do you see what I mean? Thoughts on keeping the validation?

I know these tokenizers are being used for some error validation but its pretty odd to specify them multiple times + preprocessing really does belong with the dataset.

The trouble is that we need to make sure that the tokenizer's outputs are compatible with the model's inputs -- there is some level of codesign going on here. I agree that it is odd to use them multiple times, is it possible to centralize them somewhere?

I saw assert transformers.AutoModelForMaskedLM.from_pretrained is not None, "from_pretrained should not be None", is this being used for anything other than making pyright happy?

Not that I can remember!

With these changes, bert-base probably doesn't need multiple eval anymore. If that makes sense I can remove that from the yaml, thoughts?

Wait, why doesn't it need multiple evaluators anymore? I think I'm missing something.

composer/models/bert/bert_hparams.py Outdated Show resolved Hide resolved
@ishanashastri ishanashastri marked this pull request as draft June 15, 2022 22:33
@ishanashastri ishanashastri marked this pull request as ready for review June 16, 2022 21:57
@ishanashastri
Copy link
Contributor

ishanashastri commented Jun 16, 2022

I removed some of the validations on input args, mainly the one that prevents users from specifying a config and pretrained=True. Hugging Face does allow this behavior, they just load all the weights that are applicable and throw a warning. How do you feel about this?

If a user specifies a config, then there will likely not be a pretrained model to pull -- I wanted to preempt this problem by raising an exception, do you see what I mean? Thoughts on keeping the validation?

Agreed, validation was put back!

I know these tokenizers are being used for some error validation but its pretty odd to specify them multiple times + preprocessing really does belong with the dataset.

The trouble is that we need to make sure that the tokenizer's outputs are compatible with the model's inputs -- there is some level of codesign going on here. I agree that it is odd to use them multiple times, is it possible to centralize them somewhere?

Model inputs inferred from model type/pulled from tokenizer for validation; future PR will handle adding proper tokenization and training entrypoints

@ishanashastri ishanashastri requested a review from moinnadeem June 16, 2022 22:06
Copy link
Contributor

@moinnadeem moinnadeem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ishanashastri Can you let me know when the notebooks have been refactored? Since 0.8 is getting cut today, we should be good to merge this soon.

Copy link
Contributor

@moinnadeem moinnadeem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Great work!

@A-Jacobson A-Jacobson removed the request for review from eracah June 28, 2022 19:04
@ishanashastri ishanashastri merged commit 99cf7d2 into mosaicml:dev Jun 28, 2022
@Landanjs Landanjs mentioned this pull request Jun 28, 2022
14 tasks
@eracah eracah requested review from eracah and removed request for eracah June 29, 2022 01:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants