Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Hotfix] Fix BOFT mixed precision #1925

Merged
merged 7 commits into from
Aug 7, 2024
Merged

Conversation

Edenzzzz
Copy link
Contributor

@Edenzzzz Edenzzzz commented Jul 14, 2024

The authors of BOFT seemingly forgot to cast some data types in the bf16/fp16 mixed precision setting, so me and @srguo24 fixed them during our research project.
See error below (reproducible when running any model with transformers trainer in bf16)
image

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for providing this fix for BOFT. Do you have a small example that produces the error you showed? Ideally, we can turn this into a unit test for this bug.

@@ -77,9 +78,6 @@ def get_fbd_cuda():
if _FBD_CUDA is not None:
return _FBD_CUDA

# This import initializes cuda context and should thus be local, see issue 1877
from torch.utils.cpp_extension import load
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please undo this change? The import should be local. Maybe merging with/rebasing on the latest main is sufficient.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

boft_rotation = butterfly_oft_mat @ boft_rotation
boft_scale = boft_s * boft_scale

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove.

@BenjaminBossan
Copy link
Member

@Edenzzzz do you still plan to work on this PR?

@Edenzzzz
Copy link
Contributor Author

Edenzzzz commented Aug 5, 2024

@BenjaminBossan Sorry for the late update. I have added the test.

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding test. I ran them locally and indeed they fail without your fix. Let's still make some small changes to the test, please check my comments. Also, could you please run make style so that the linter/formatter can fix a few things?

@@ -1135,6 +1135,20 @@ def test_dora_ephemeral_gpu_offload(self):
# The results should be the same
assert torch.allclose(out_peft_model_cpu, out_peft_model_ego)

@require_torch_gpu
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please move this whole test to tests/test_gpu_examples.py? Please create a new test class TestBOFT at the very bottom of the file and place this test inside of the class. You could also split it into two tests, one for Linear and one for Conv2d.

layer = nn.Linear(160, 160).cuda()
layer = Linear(layer, "layer", boft_n_butterfly_factor=2).to(dtype=torch.bfloat16)
x = torch.randn(160, 160, device="cuda", dtype=torch.bfloat16)
x = layer(x)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
x = layer(x)
layer(x) # does not raise

conv = nn.Conv2d(1, 1, 4).cuda()
conv = Conv2d(conv, "conv", boft_n_butterfly_factor=2).to(dtype=torch.bfloat16)
x = torch.randn(1, 160, 160, device="cuda", dtype=torch.bfloat16)
x = conv(x)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
x = conv(x)
conv(x) # does not raise

@@ -50,7 +50,7 @@
)
from peft.import_utils import is_bnb_4bit_available, is_bnb_available
from peft.tuners.lora.config import LoraRuntimeConfig

from peft.tuners.boft.layer import Linear, Conv2d
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's import from peft.tuners import boft and below use boft.layer.Linear and boft.layer.Conv2d.

@Edenzzzz
Copy link
Contributor Author

Edenzzzz commented Aug 5, 2024

Thanks for the feedback. Pushed changes

@BenjaminBossan
Copy link
Member

Thanks for the feedback. Pushed changes

Could you please run make style once more? You probably ran it before you made the other changes.

@Edenzzzz
Copy link
Contributor Author

Edenzzzz commented Aug 5, 2024

I just ran make style but it only produced error messages without fixing them?
image

@BenjaminBossan
Copy link
Member

Could you please ensure that you have ruff==4.10.0 installed.

@Edenzzzz
Copy link
Contributor Author

Edenzzzz commented Aug 6, 2024

Do you mean 0.4.1? Perhaps we can add that to the requirements?
image

@Edenzzzz
Copy link
Contributor Author

Edenzzzz commented Aug 6, 2024

well I see that in setup.py but didn't come with pip install -e .. Anyway, I reran with ruff 0.4.10.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update.

The new tests are failing because they are run despite the CI not having a GPU. I added a comment on how to circumvent this. There are also some other failing tests, but they appear to be unrelated, I'll investigate.

but didn't come with pip install -e .

The reason is because ruff is an extra dependency (as it's only relevant for devs, not users). If you install with pip install -e .[test] it should be included.

@@ -3076,3 +3077,25 @@ def test_bnb_4bit_wrap_fsdp(self):
init_process_group(world_size=1, rank=0)
# check that this does not raise:
FSDP(model, auto_wrap_policy=fsdp_auto_wrap_policy(model), use_orig_params=False, sync_module_states=True)


@require_torch_gpu
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This decorator doesn't work as is. The reason why it works on other classes in this file but not here is because these other classes inherit from unittest.TestCase. But instead of doing the same, let's just move the decorator on the two test methods directly, then it works as expected.

@BenjaminBossan
Copy link
Member

BenjaminBossan commented Aug 6, 2024

There are also some other failing tests, but they appear to be unrelated, I'll investigate.

Okay, the other failing tests were caused by an interaction with the latest transformers patch release. Please rebase on/merge with main to remedy this.

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the fixes, this now LGTM.

I'll ping the BOFT authors @YuliangXiu @yfeng95 @Zeju1997 @DTennant in case they want to add something. If I don't hear back in a couple of days, I'll assume this is good to be merged.

@Zeju1997
Copy link
Contributor

Zeju1997 commented Aug 7, 2024

Dear all, thanks a lot for the effort!

@BenjaminBossan BenjaminBossan merged commit c869664 into huggingface:main Aug 7, 2024
14 checks passed
@BenjaminBossan
Copy link
Member

Thanks for checking @Zeju1997

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants