Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix MatMul8bitLtBackward view issue #1425

Merged
merged 1 commit into from
Feb 2, 2024
Merged

Conversation

younesbelkada
Copy link
Contributor

@younesbelkada younesbelkada commented Feb 1, 2024

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@winglian
Copy link
Contributor

winglian commented Feb 1, 2024

with both plain DDP or Deepspeed I now get:

  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/trainer.py", line 1561, in train
    return inner_training_loop(
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/trainer.py", line 1893, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/trainer.py", line 2822, in training_step
    self.accelerator.backward(loss)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/accelerate/accelerator.py", line 1964, in backward
    loss.backward(**kwargs)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/_tensor.py", line 492, in backward
    torch.autograd.backward(
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/autograd/__init__.py", line 251, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/autograd/function.py", line 288, in apply
    return user_fn(self, *args)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 288, in backward
    torch.autograd.backward(outputs_with_grad, args_with_grad)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/autograd/__init__.py", line 251, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/autograd/function.py", line 288, in apply
    return user_fn(self, *args)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 485, in backward
    .mul_(state.SCB.unsqueeze(1).mul(1.0 / 127.0))
RuntimeError: The size of tensor a (32) must match the size of tensor b (8) at non-singleton dimension 0

@younesbelkada
Copy link
Contributor Author

Hmm @winglian
How do you load your model with DDP? Does it works without DDP?

@winglian
Copy link
Contributor

winglian commented Feb 1, 2024

Just w/ accelerate launch. I limited to a single GPU w CUDA_VISIBLE_DEVICES=0 and it raises the same error as above.

Also, swapping the model for Mistral works without any issues, it's only Mixtral that has problems.

@younesbelkada
Copy link
Contributor Author

younesbelkada commented Feb 1, 2024

Ahh I see ok, I think then we can merge this first and I'll make a patch regarding Mixtral on transformers, it should be about updating the modeling code!

@younesbelkada younesbelkada marked this pull request as ready for review February 1, 2024 07:50
@younesbelkada
Copy link
Contributor Author

cc @pacman100

Copy link
Contributor

@pacman100 pacman100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @younesbelkada for the fixes!

@younesbelkada younesbelkada merged commit ce925d8 into main Feb 2, 2024
14 checks passed
@younesbelkada younesbelkada deleted the younesbelkada-patch-2 branch February 2, 2024 07:30
BenjaminBossan pushed a commit to BenjaminBossan/peft that referenced this pull request Mar 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants