-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Hotfix] Fix BOFT mixed precision #1925
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for providing this fix for BOFT. Do you have a small example that produces the error you showed? Ideally, we can turn this into a unit test for this bug.
src/peft/tuners/boft/layer.py
Outdated
@@ -77,9 +78,6 @@ def get_fbd_cuda(): | |||
if _FBD_CUDA is not None: | |||
return _FBD_CUDA | |||
|
|||
# This import initializes cuda context and should thus be local, see issue 1877 | |||
from torch.utils.cpp_extension import load |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please undo this change? The import should be local. Maybe merging with/rebasing on the latest main is sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
src/peft/tuners/boft/layer.py
Outdated
boft_rotation = butterfly_oft_mat @ boft_rotation | ||
boft_scale = boft_s * boft_scale | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove.
@Edenzzzz do you still plan to work on this PR? |
@BenjaminBossan Sorry for the late update. I have added the test. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding test. I ran them locally and indeed they fail without your fix. Let's still make some small changes to the test, please check my comments. Also, could you please run make style
so that the linter/formatter can fix a few things?
tests/test_common_gpu.py
Outdated
@@ -1135,6 +1135,20 @@ def test_dora_ephemeral_gpu_offload(self): | |||
# The results should be the same | |||
assert torch.allclose(out_peft_model_cpu, out_peft_model_ego) | |||
|
|||
@require_torch_gpu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please move this whole test to tests/test_gpu_examples.py
? Please create a new test class TestBOFT
at the very bottom of the file and place this test inside of the class. You could also split it into two tests, one for Linear
and one for Conv2d
.
tests/test_common_gpu.py
Outdated
layer = nn.Linear(160, 160).cuda() | ||
layer = Linear(layer, "layer", boft_n_butterfly_factor=2).to(dtype=torch.bfloat16) | ||
x = torch.randn(160, 160, device="cuda", dtype=torch.bfloat16) | ||
x = layer(x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
x = layer(x) | |
layer(x) # does not raise |
tests/test_common_gpu.py
Outdated
conv = nn.Conv2d(1, 1, 4).cuda() | ||
conv = Conv2d(conv, "conv", boft_n_butterfly_factor=2).to(dtype=torch.bfloat16) | ||
x = torch.randn(1, 160, 160, device="cuda", dtype=torch.bfloat16) | ||
x = conv(x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
x = conv(x) | |
conv(x) # does not raise |
tests/test_common_gpu.py
Outdated
@@ -50,7 +50,7 @@ | |||
) | |||
from peft.import_utils import is_bnb_4bit_available, is_bnb_available | |||
from peft.tuners.lora.config import LoraRuntimeConfig | |||
|
|||
from peft.tuners.boft.layer import Linear, Conv2d |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's import from peft.tuners import boft
and below use boft.layer.Linear
and boft.layer.Conv2d
.
Thanks for the feedback. Pushed changes |
Could you please run |
Could you please ensure that you have |
well I see that in setup.py but didn't come with |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update.
The new tests are failing because they are run despite the CI not having a GPU. I added a comment on how to circumvent this. There are also some other failing tests, but they appear to be unrelated, I'll investigate.
but didn't come with pip install -e .
The reason is because ruff is an extra dependency (as it's only relevant for devs, not users). If you install with pip install -e .[test]
it should be included.
tests/test_gpu_examples.py
Outdated
@@ -3076,3 +3077,25 @@ def test_bnb_4bit_wrap_fsdp(self): | |||
init_process_group(world_size=1, rank=0) | |||
# check that this does not raise: | |||
FSDP(model, auto_wrap_policy=fsdp_auto_wrap_policy(model), use_orig_params=False, sync_module_states=True) | |||
|
|||
|
|||
@require_torch_gpu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This decorator doesn't work as is. The reason why it works on other classes in this file but not here is because these other classes inherit from unittest.TestCase
. But instead of doing the same, let's just move the decorator on the two test methods directly, then it works as expected.
Okay, the other failing tests were caused by an interaction with the latest transformers patch release. Please rebase on/merge with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the fixes, this now LGTM.
I'll ping the BOFT authors @YuliangXiu @yfeng95 @Zeju1997 @DTennant in case they want to add something. If I don't hear back in a couple of days, I'll assume this is good to be merged.
Dear all, thanks a lot for the effort! |
Thanks for checking @Zeju1997 |
The authors of BOFT seemingly forgot to cast some data types in the bf16/fp16 mixed precision setting, so me and @srguo24 fixed them during our research project.
See error below (reproducible when running any model with transformers trainer in bf16)