-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
previous_dtype is now inferred from F.linear's result output type. #1010
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
Thanks for providing the fix. Could you please run |
Done. Also, I tested it, successfully running my experiment with GPT-2 LORA fine-tuning, with original model computation running in bf16 as intended. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for providing this fix and for testing it. I ran the GPU tests locally and they also passed (but I can't test with bf16). Do you happen to have a quick test at the ready to show that it works with bf16 and that doesn't involve inspecting intermediate results (as in the original issue)?
Regarding the PR, could you please remove the empty line 243, it's not required anymore. Also, should the forward
methods of Embedding
and Conv2d
also be changed accordingly?
@MFajcik Are you still working on this? |
@MFajcik Friendly reminder about this PR :) There are some merge conflicts now due to a recent change. Let me know if you need help resolving it. |
@BenjaminBossan Yeah I didn't forget about this. I just don't have time right now. |
Fantastic, thanks. I don't want to put any pressure on you, take the time you need, I just wanted to ping you because sometimes people just forget :) |
@BenjaminBossan I merged changes, and reapplied fix for Conv2D, Linear and Embeddings. I also ran make format, and added 1 test covering the fix for LoRA. Is that ok? |
@MFajcik Thanks for making the updates. The quality check still fails, could you please re-run
Thanks for adding the test. Probably we can move it to some of the existing files, but I'll take a closer look later and make a suggestion. |
Yeah, I forgot to run make style after writing the test again... Should be okay now. |
@MFajcik Code quality checks are still failing:
|
Friendly ping @MFajcik |
now? @BenjaminBossan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I left a single comment with respect to failing tests
return conv_output | ||
|
||
|
||
class TestAutoCast(unittest.TestCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test seems to require a GPU, can you add the require_torch_gpu
decorator here?
Line 25 in 3708793
def require_torch_gpu(test_case): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @MFajcik for making LoRA work correctly with AMP. This looks pretty good already. I have a couple of change requests and suggestions for improvement, please check them out.
In terms of the general approach, now that we use
result = self.base_layer(x, *args, **kwargs)
I think it's much safer to make this change than at the time the issue was originally discussed. This way, we should have a guarantee that LoRA does not change the dtype of the output.
src/peft/tuners/lora/layer.py
Outdated
@@ -251,7 +251,8 @@ def __init__( | |||
r: int = 0, | |||
lora_alpha: int = 1, | |||
lora_dropout: float = 0.0, | |||
fan_in_fan_out: bool = False, # Set this to True if the layer to replace stores weight like (fan_in, fan_out) | |||
fan_in_fan_out: bool = False, | |||
# Set this to True if the layer to replace stores weight like (fan_in, fan_out) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we undo this change, or at least move the comment above 254, where it belongs?
@@ -0,0 +1,159 @@ | |||
import unittest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add the comment header that we have in all our files? Also, please move this test to test_gpu_examples.py
, otherwise it won't be run on CI (since our normal CI has no GPUs and only selected test files are run with GPU in our nightly tests).
|
||
class SimpleModel(nn.Module): | ||
def __init__(self): | ||
super(SimpleModel, self).__init__() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit:
super(SimpleModel, self).__init__() | |
super().__init__() |
Same for other classes.
self.embedding_layer = nn.Embedding(1000, 768) | ||
self.layer_norm = nn.LayerNorm(768) | ||
self.linear_transform_base = nn.Linear(768, 256) | ||
self.linear_transform = LoraLinear( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of wrapping the layer explicitly, let's create a LoraConfig
and call get_peft_model
with SimpleModel
as input. test_simple_lora_linear_model
could be responsible for initializing the class and passes the instance to _test_model
. This way, we can be 100% that this generates a model the same way that users typically do.
super(SimpleLorAEmbeddingModel, self).__init__() | ||
|
||
self.embedding_layer_base = nn.Embedding(1000, 768) | ||
self.embedding_layer = LoraEmbedding( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same argument as above.
self.embedding_layer = nn.Embedding(1000, 768) | ||
self.layer_norm = nn.LayerNorm(768) | ||
self.conv2d_transform_base = nn.Conv2d(1, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) | ||
self.conv2d_transform = LoraConv2d( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same argument as above.
|
||
@require_torch_gpu | ||
class TestAutoCast(unittest.TestCase): | ||
def test_simple_model(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WDYT about parametrizing the test (using parameterize
, see other tests) over the dtype? That way, we can run a single test per test case, which is usually preferable.
# Prepare dummy inputs | ||
input_ids = torch.randint(0, 1000, (2, 10)).cuda() | ||
|
||
# Forward pass with torch.bfloat16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the bf16 case, can we please run self.skipTest
if not torch.cuda.is_bf16_supported()
?
@MFajcik Do you still plan to work on this? :) |
Hopefully this month sometimes :) |
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
@BenjaminBossan I think I adressed all your comments now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @MFajcik for providing the fix and adding tests for it. From my point of view, this looks good, but I'd like to have another review by @younesbelkada or @pacman100 to be 100% sure.
I saw that other methods probably need to be changed in the same way as done in this PR (LoHa, LoKr, poly, oft, IA³, lora.tp_layer.LoraParallelLinear
), but that can be done in a subsequent PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @MFajcik for the fixes and @BenjaminBossan for all the discussions and suggestions!
@MFajcik I think we should be good to merge this PR once the merge conflict is resolved. It should be pretty straightforward, let me know if there are any questions. Please note that we now use normal |
# Conflicts: # tests/test_gpu_examples.py
Could you please run |
Fixes the issue from #971