refactor bert and gpt #1130

A-Jacobson · 2022-06-06T22:59:42Z

I took a stab at converting GPT2 and BERT to use HuggingFaceModel and deleting transformer_shared and transformer_hparams. I'd love for you @moinnadeem to check it out before I/or someone else tests them on the cluster as there are a few things that will probably explode despite the passing tests.

The main changes are in bert/model.py and gpt/model.py I converted the model creations to factory functions and moved all logic out of the hparams classes.

I took some liberties that you may find.. questionable so I'm going to put them front and center so you don't have to dig. Please correct me if i'm very wrong in some of these assumptions.

Stuff that's probably going to break:

sequence length warmup - For some reason this isn't making my tests fail, but I can see that implementation relies on getting model_inputs. I didn't implement that in HuggingFaceModel because it seemed like it seemed like it was mostly for error checking and more related to the dataset being used. It's just not a great experience to have to manually specify these. Is this something we necessarily have to manually add or could it be inferred automagically from either the model or dataset?
default torchmetrics? - Maybe? I saw that the original BERTModel had these lines:

# if we are in the single class case, then remove the classes dimension
        if output.shape[1] == 1:
            output = output.squeeze(dim=1)

HuggingFaceModel doesn't do this. Will this break GLUE? is this general enough to torchmetrics that i should add it to the base HuggingFaceModel?

Possible sus design choices:

I removed some of the validations on input args, mainly the one that prevents users from specifying a config and pretrained=True. Hugging Face does allow this behavior, they just load all the weights that are applicable and throw a warning. How do you feel about this?
Slimmed down the args these factories take, they no longer specify tokenizers and pretrained model names.I know these tokenizers are being used for some error validation but its pretty odd to specify them multiple times + preprocessing really does belong with the dataset.

Other thoughts:

I saw assert transformers.AutoModelForMaskedLM.from_pretrained is not None, "from_pretrained should not be None", is this being used for anything other than making pyright happy?
With these changes, bert-base probably doesn't need multiple eval anymore. If that makes sense I can remove that from the yaml, thoughts?

Addresses CO-386

moinnadeem

re: sequence length warmup, Is this something we necessarily have to manually add or could it be inferred automagically from either the model or dataset?

model_inputs could be automatically inferred from the model!

HuggingFaceModel doesn't do this. Will this break GLUE? is this general enough to torchmetrics that i should add it to the base HuggingFaceModel?
Yes!

I removed some of the validations on input args, mainly the one that prevents users from specifying a config and pretrained=True. Hugging Face does allow this behavior, they just load all the weights that are applicable and throw a warning. How do you feel about this?

If a user specifies a config, then there will likely not be a pretrained model to pull -- I wanted to preempt this problem by raising an exception, do you see what I mean? Thoughts on keeping the validation?

I know these tokenizers are being used for some error validation but its pretty odd to specify them multiple times + preprocessing really does belong with the dataset.

The trouble is that we need to make sure that the tokenizer's outputs are compatible with the model's inputs -- there is some level of codesign going on here. I agree that it is odd to use them multiple times, is it possible to centralize them somewhere?

I saw assert transformers.AutoModelForMaskedLM.from_pretrained is not None, "from_pretrained should not be None", is this being used for anything other than making pyright happy?

Not that I can remember!

With these changes, bert-base probably doesn't need multiple eval anymore. If that makes sense I can remove that from the yaml, thoughts?

Wait, why doesn't it need multiple evaluators anymore? I think I'm missing something.

composer/models/bert/bert_hparams.py

Co-authored-by: Moin Nadeem <Moinnadeem@moinnadeem.com>

…to refactor-bert-gpt

ishanashastri · 2022-06-16T22:01:29Z

I removed some of the validations on input args, mainly the one that prevents users from specifying a config and pretrained=True. Hugging Face does allow this behavior, they just load all the weights that are applicable and throw a warning. How do you feel about this?

If a user specifies a config, then there will likely not be a pretrained model to pull -- I wanted to preempt this problem by raising an exception, do you see what I mean? Thoughts on keeping the validation?

Agreed, validation was put back!

I know these tokenizers are being used for some error validation but its pretty odd to specify them multiple times + preprocessing really does belong with the dataset.

The trouble is that we need to make sure that the tokenizer's outputs are compatible with the model's inputs -- there is some level of codesign going on here. I agree that it is odd to use them multiple times, is it possible to centralize them somewhere?

Model inputs inferred from model type/pulled from tokenizer for validation; future PR will handle adding proper tokenization and training entrypoints

moinnadeem

@ishanashastri Can you let me know when the notebooks have been refactored? Since 0.8 is getting cut today, we should be good to merge this soon.

moinnadeem

LGTM! Great work!

composer/models/bert/bert_hparams.py

composer/models/gpt2/gpt2_hparams.py

…ry functions

…to A-Jacobson-refactor-bert-gpt

refactor bert and gpt

5a959eb

A-Jacobson requested a review from moinnadeem June 6, 2022 22:59

pyright

b51bce2

A-Jacobson requested a review from eracah June 6, 2022 23:16

A-Jacobson added 2 commits June 6, 2022 16:20

add bert factories to dunder

2e0101d

normalxent

0ff67d1

moinnadeem reviewed Jun 10, 2022

View reviewed changes

composer/models/bert/bert_hparams.py Outdated Show resolved Hide resolved

A-Jacobson and others added 5 commits June 13, 2022 15:58

Update composer/models/bert/bert_hparams.py

6f61381

Co-authored-by: Moin Nadeem <Moinnadeem@moinnadeem.com>

added args validation and ignore vocab size in CE loss

385b90d

Merge branch 'refactor-bert-gpt' of github.com:A-Jacobson/composer in…

715d1df

…to refactor-bert-gpt

removed comma that broke training

137b967

replaced CE with HFCE

6ffba91

ishanashastri marked this pull request as draft June 15, 2022 22:33

ishanashastri added 4 commits June 15, 2022 15:43

fixed hf bert logits case

7edddc7

added tokenizers back for model validation

e0895f5

centralized tokenization

8027de8

fixed failing test

da41a35

ishanashastri marked this pull request as ready for review June 16, 2022 21:57

ishanashastri requested a review from moinnadeem June 16, 2022 22:06

proofread docs and clean notebooks

fe1dc36

moinnadeem reviewed Jun 24, 2022

View reviewed changes

moinnadeem approved these changes Jun 27, 2022

View reviewed changes

moinnadeem reviewed Jun 27, 2022

View reviewed changes

composer/models/bert/bert_hparams.py Show resolved Hide resolved

A-Jacobson commented Jun 27, 2022

View reviewed changes

composer/models/gpt2/gpt2_hparams.py Outdated Show resolved Hide resolved

ishanashastri added 5 commits June 27, 2022 11:56

added pretrained_model_name and refactored tokenizer logic into facto…

862c2af

…ry functions

fixed glue yamls

6185972

Merge branch 'refactor-bert-gpt' of github.com:A-Jacobson/composer in…

7223018

…to A-Jacobson-refactor-bert-gpt

Merge branch 'A-Jacobson-refactor-bert-gpt' into refactor-bert-gpt

177fb41

Merge branch 'dev' into refactor-bert-gpt

6716189

ishanashastri added 5 commits June 27, 2022 14:25

propgated changes where dependent on old BERTModel class

87ef5b4

fixed GLU dependencies

515c856

fixed FLN tests

062b749

making pyright happy

0273b09

making pyright happy

58ab103

A-Jacobson removed the request for review from eracah June 28, 2022 19:04

ishanashastri added 2 commits June 28, 2022 12:17

docstring linting

568c061

new line string literal

cba3b0b

ishanashastri merged commit 99cf7d2 into mosaicml:dev Jun 28, 2022

Landanjs mentioned this pull request Jun 28, 2022

Refactor vision models #1227

Merged

14 tasks

eracah requested review from eracah and removed request for eracah June 29, 2022 01:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor bert and gpt #1130

refactor bert and gpt #1130

A-Jacobson commented Jun 6, 2022 •

edited by bandish-shah

Loading

moinnadeem left a comment •

edited

Loading

ishanashastri commented Jun 16, 2022 •

edited

Loading

moinnadeem left a comment

moinnadeem left a comment

refactor bert and gpt #1130

refactor bert and gpt #1130

Conversation

A-Jacobson commented Jun 6, 2022 • edited by bandish-shah Loading

moinnadeem left a comment • edited Loading

Choose a reason for hiding this comment

ishanashastri commented Jun 16, 2022 • edited Loading

moinnadeem left a comment

Choose a reason for hiding this comment

moinnadeem left a comment

Choose a reason for hiding this comment

A-Jacobson commented Jun 6, 2022 •

edited by bandish-shah

Loading

moinnadeem left a comment •

edited

Loading

ishanashastri commented Jun 16, 2022 •

edited

Loading