Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Torch size missmatch in GPT-J model (Error) #13499

Closed
MantasLukauskas opened this issue Sep 9, 2021 · 4 comments
Closed

Torch size missmatch in GPT-J model (Error) #13499

MantasLukauskas opened this issue Sep 9, 2021 · 4 comments

Comments

@MantasLukauskas
Copy link

MantasLukauskas commented Sep 9, 2021

Environment info

  • transformers version: 4.11.0.dev0
  • Platform: Linux-4.19.0-10-cloud-amd64-x86_64-with-debian-10.5
  • Python version: 3.7.8
  • PyTorch version (GPU?): 1.7.1+cu110 (True)
  • Tensorflow version (GPU?): 2.4.1 (True)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?:
  • Using distributed or parallel set-up in script?:

Who can help

@patrickvonplaten, @LysandreJik, @patil-suraj

Information

I trained/fine-tuned the GPT-J model (model here: https://huggingface.co/EleutherAI/gpt-j-6B), fp16 like in this one suggestion (#13329) and now when I try to use the pipeline to load the model (as shown below) I got torch dimensions error (shown below as well). What solutions can be applied there?

from transformers import AutoModelForCausalLM, AutoTokenizer
2021-09-09 13:23:45.897519: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
model = AutoModelForCausalLM.from_pretrained("GPT-J/checkpoint-20000")
Traceback (most recent call last):
File "", line 1, in
File "/opt/conda/lib/python3.7/site-packages/transformers/models/auto/auto_factory.py", line 388, in from_pretrained
return model_class.from_pretrained(pretrained_model_name_or_path, *model_args, config=config, **kwargs)
File "/opt/conda/lib/python3.7/site-packages/transformers/modeling_utils.py", line 1376, in from_pretrained
_fast_init=_fast_init,
File "/opt/conda/lib/python3.7/site-packages/transformers/modeling_utils.py", line 1523, in _load_state_dict_into_model
raise RuntimeError(f"Error(s) in loading state_dict for {model.class.name}:\n\t{error_msg}")
RuntimeError: Error(s) in loading state_dict for GPTJForCausalLM:
size mismatch for lm_head.weight: copying a param with shape torch.Size([50400, 4096]) from checkpoint, the shape in current model is torch.Size([50257, 4096]).
size mismatch for lm_head.bias: copying a param with shape torch.Size([50400]) from checkpoint, the shape in current model is torch.Size([50257]).

Any ideas on how to solve it?

@MantasLukauskas MantasLukauskas changed the title Torch missmatch in GPT-J model Torch size missmatch in GPT-J model Sep 9, 2021
@MantasLukauskas MantasLukauskas changed the title Torch size missmatch in GPT-J model Torch size missmatch in GPT-J model (Error) Sep 9, 2021
@patrickvonplaten
Copy link
Contributor

Hey @MantasLukauskas - could you please provide a reproducible code snippet? :-)

@MantasLukauskas
Copy link
Author

Hey @patrickvonplaten,

Fine-tuning was made in this one:
deepspeed --num_gpus 2 run_clm.py --model_name_or_path EleutherAI/gpt-j-6B --num_train_epochs 10 --per_device_train_batch_size 2 --per_device_eval_batch_size 8 --train_file train.txt --validation_file test.txt --do_train --do_eval --output_dir GPT-J --save_steps 5000 --block_size=512 --evaluation_strategy "epoch" --logging_steps 200 --logging_dir GPT-J/runs --model_revision float16 --fp16 --deepspeed zero3.json

You can use save steps 10 for faster saving :)

After that, I try to load my model like in doc just use my model instead of EleutherAI/gpt-j-6B. Interesting that if I use pretrained Eleuther model everything works, but when I use my fine-tuned model with run_clm.py error occurs. Maybe that will help you to solve it or just have an idea of what is wrong there?

from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("EleutherAI/gpt-j-6B")
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-j-6B")
prompt = "In a shocking finding, scientists discovered a herd of unicorns living in a remote, "
... "previously unexplored valley, in the Andes Mountains. Even more surprising to the "
... "researchers was the fact that the unicorns spoke perfect English."
input_ids = tokenizer(prompt, return_tensors="pt").input_ids
gen_tokens = model.generate(input_ids, do_sample=True, temperature=0.9, max_length=100,)
gen_text = tokenizer.batch_decode(gen_tokens)[0]

@patil-suraj
Copy link
Contributor

related #13581

@LysandreJik
Copy link
Member

Should be fixed by #13617 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants