Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistency between GPTNeo and GPT2 config classes #12183

Closed
leogao2 opened this issue Jun 15, 2021 · 6 comments
Closed

Inconsistency between GPTNeo and GPT2 config classes #12183

leogao2 opened this issue Jun 15, 2021 · 6 comments
Labels
WIP Label your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress

Comments

@leogao2
Copy link
Contributor

leogao2 commented Jun 15, 2021

The config classes for GPTNeo and GPT2 have a bunch of differences that are seemingly unnecessary. This makes it harder for downstream users to write code that depends on accessing these attributes. See below:

image
It seems that max_position_embeddings, hidden_size, num_layers, num_heads, intermediate_size, resid_dropout, embed_dropout, and attention_dropout should be renamed for sonsistency with the GPT2 config class.

Who can help

@LysandreJik @patil-suraj

@nostalgebraist
Copy link

Seconding this.

Last month I swapped out GPT-2 for GPT-Neo in a project, and these differences made it more difficult to adapt my existing code.

@LysandreJik
Copy link
Member

Hi @leogao2 and @nostalgebraist, thanks for opening an issue! You're correct that the way this is currently implemented it prevents a few use-cases. Namely this is authorized:

from transformers import GPT2Config

config = GPT2Config()
config.hidden_size

But these are not:

from transformers import GPT2Config

config = GPT2Config()
config.hidden_size = 4
# Fails

config = GPT2Config(hidden_size=4)
# Fails

Unfortunately we can't just rename arguments - as this would break both checkpoints on the hub as well as local checkpoints. We're thinking of a way to enable this with a convention set across configurations for the attributes you mention - this convention would allow getting and setting attributes that are defined in this convention, such as the ones you mention.

Let us explore a bit and we'll come back to you. cc @patil-suraj @patrickvonplaten @sgugger

@nostalgebraist
Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

This still needs to be addressed.

@huggingface huggingface deleted a comment from github-actions bot Jul 16, 2021
@LysandreJik LysandreJik added the WIP Label your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress label Jul 16, 2021
@leogao2
Copy link
Contributor Author

leogao2 commented Aug 5, 2021

Is there any progress on this?

@LysandreJik
Copy link
Member

Hey @leogao2! Yes, a proposal is available here: nreimers@2198ee7 but there are still a few rough edges to polish. We'll try to have it merged in the next few weeks, will let you know.

@LysandreJik
Copy link
Member

This was fixed in #13026 which will be in the next release alongside GPT-J. Thank you for opening an issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
WIP Label your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress
Projects
None yet
Development

No branches or pull requests

3 participants