Fix `MatMul8bitLtBackward` view issue #1425

younesbelkada · 2024-02-01T06:58:48Z

Partially fixes: https://twitter.com/winglian/status/1745814494670983666

HuggingFaceDocBuilderDev · 2024-02-01T07:02:14Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

winglian · 2024-02-01T07:41:05Z

with both plain DDP or Deepspeed I now get:

  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/trainer.py", line 1561, in train
    return inner_training_loop(
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/trainer.py", line 1893, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/trainer.py", line 2822, in training_step
    self.accelerator.backward(loss)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/accelerate/accelerator.py", line 1964, in backward
    loss.backward(**kwargs)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/_tensor.py", line 492, in backward
    torch.autograd.backward(
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/autograd/__init__.py", line 251, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/autograd/function.py", line 288, in apply
    return user_fn(self, *args)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 288, in backward
    torch.autograd.backward(outputs_with_grad, args_with_grad)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/autograd/__init__.py", line 251, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/autograd/function.py", line 288, in apply
    return user_fn(self, *args)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 485, in backward
    .mul_(state.SCB.unsqueeze(1).mul(1.0 / 127.0))
RuntimeError: The size of tensor a (32) must match the size of tensor b (8) at non-singleton dimension 0

younesbelkada · 2024-02-01T07:44:34Z

Hmm @winglian
How do you load your model with DDP? Does it works without DDP?

winglian · 2024-02-01T07:49:20Z

Just w/ accelerate launch. I limited to a single GPU w CUDA_VISIBLE_DEVICES=0 and it raises the same error as above.

Also, swapping the model for Mistral works without any issues, it's only Mixtral that has problems.

younesbelkada · 2024-02-01T07:50:20Z

Ahh I see ok, I think then we can merge this first and I'll make a patch regarding Mixtral on transformers, it should be about updating the modeling code!

younesbelkada · 2024-02-01T07:50:33Z

cc @pacman100

pacman100

Thank you @younesbelkada for the fixes!

Update bnb.py

ac06a9f

younesbelkada marked this pull request as ready for review February 1, 2024 07:50

younesbelkada requested a review from pacman100 February 1, 2024 07:50

pacman100 approved these changes Feb 2, 2024

View reviewed changes

younesbelkada merged commit ce925d8 into main Feb 2, 2024
14 checks passed

younesbelkada deleted the younesbelkada-patch-2 branch February 2, 2024 07:30

BenjaminBossan mentioned this pull request Feb 6, 2024

Fix for "leaf Variable that requires grad" Error in In-Place Operation #1372

Merged

BenjaminBossan pushed a commit to BenjaminBossan/peft that referenced this pull request Mar 14, 2024

Update bnb.py (huggingface#1425)

e84daed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix `MatMul8bitLtBackward` view issue #1425

Fix `MatMul8bitLtBackward` view issue #1425

younesbelkada commented Feb 1, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Feb 1, 2024

winglian commented Feb 1, 2024

younesbelkada commented Feb 1, 2024

winglian commented Feb 1, 2024

younesbelkada commented Feb 1, 2024 •

edited

Loading

younesbelkada commented Feb 1, 2024

pacman100 left a comment

Fix MatMul8bitLtBackward view issue #1425

Fix MatMul8bitLtBackward view issue #1425

Conversation

younesbelkada commented Feb 1, 2024 • edited Loading

HuggingFaceDocBuilderDev commented Feb 1, 2024

winglian commented Feb 1, 2024

younesbelkada commented Feb 1, 2024

winglian commented Feb 1, 2024

younesbelkada commented Feb 1, 2024 • edited Loading

younesbelkada commented Feb 1, 2024

pacman100 left a comment

Choose a reason for hiding this comment

Fix `MatMul8bitLtBackward` view issue #1425

Fix `MatMul8bitLtBackward` view issue #1425

younesbelkada commented Feb 1, 2024 •

edited

Loading

younesbelkada commented Feb 1, 2024 •

edited

Loading