previous_dtype is now inferred from F.linear's result output type.#1010
previous_dtype is now inferred from F.linear's result output type.#1010BenjaminBossan merged 16 commits intohuggingface:mainfrom
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
|
Thanks for providing the fix. Could you please run |
|
Done. Also, I tested it, successfully running my experiment with GPT-2 LORA fine-tuning, with original model computation running in bf16 as intended. |
BenjaminBossan
left a comment
There was a problem hiding this comment.
Thanks for providing this fix and for testing it. I ran the GPU tests locally and they also passed (but I can't test with bf16). Do you happen to have a quick test at the ready to show that it works with bf16 and that doesn't involve inspecting intermediate results (as in the original issue)?
Regarding the PR, could you please remove the empty line 243, it's not required anymore. Also, should the forward methods of Embedding and Conv2d also be changed accordingly?
|
@MFajcik Are you still working on this? |
|
@MFajcik Friendly reminder about this PR :) There are some merge conflicts now due to a recent change. Let me know if you need help resolving it. |
|
@BenjaminBossan Yeah I didn't forget about this. I just don't have time right now. |
|
Fantastic, thanks. I don't want to put any pressure on you, take the time you need, I just wanted to ping you because sometimes people just forget :) |
|
@BenjaminBossan I merged changes, and reapplied fix for Conv2D, Linear and Embeddings. I also ran make format, and added 1 test covering the fix for LoRA. Is that ok? |
|
@MFajcik Thanks for making the updates. The quality check still fails, could you please re-run
Thanks for adding the test. Probably we can move it to some of the existing files, but I'll take a closer look later and make a suggestion. |
Yeah, I forgot to run make style after writing the test again... Should be okay now. |
|
@MFajcik Code quality checks are still failing:
|
|
Friendly ping @MFajcik |
|
now? @BenjaminBossan |
younesbelkada
left a comment
There was a problem hiding this comment.
Thanks, I left a single comment with respect to failing tests
| return conv_output | ||
|
|
||
|
|
||
| class TestAutoCast(unittest.TestCase): |
There was a problem hiding this comment.
This test seems to require a GPU, can you add the require_torch_gpu decorator here?
Line 25 in 3708793
BenjaminBossan
left a comment
There was a problem hiding this comment.
Thanks a lot @MFajcik for making LoRA work correctly with AMP. This looks pretty good already. I have a couple of change requests and suggestions for improvement, please check them out.
In terms of the general approach, now that we use
result = self.base_layer(x, *args, **kwargs)I think it's much safer to make this change than at the time the issue was originally discussed. This way, we should have a guarantee that LoRA does not change the dtype of the output.
src/peft/tuners/lora/layer.py
Outdated
| lora_dropout: float = 0.0, | ||
| fan_in_fan_out: bool = False, # Set this to True if the layer to replace stores weight like (fan_in, fan_out) | ||
| fan_in_fan_out: bool = False, | ||
| # Set this to True if the layer to replace stores weight like (fan_in, fan_out) |
There was a problem hiding this comment.
Can we undo this change, or at least move the comment above 254, where it belongs?
| @@ -0,0 +1,159 @@ | |||
| import unittest | |||
There was a problem hiding this comment.
Could you please add the comment header that we have in all our files? Also, please move this test to test_gpu_examples.py, otherwise it won't be run on CI (since our normal CI has no GPUs and only selected test files are run with GPU in our nightly tests).
|
|
||
| class SimpleModel(nn.Module): | ||
| def __init__(self): | ||
| super(SimpleModel, self).__init__() |
There was a problem hiding this comment.
Nit:
| super(SimpleModel, self).__init__() | |
| super().__init__() |
Same for other classes.
| self.embedding_layer = nn.Embedding(1000, 768) | ||
| self.layer_norm = nn.LayerNorm(768) | ||
| self.linear_transform_base = nn.Linear(768, 256) | ||
| self.linear_transform = LoraLinear( |
There was a problem hiding this comment.
Instead of wrapping the layer explicitly, let's create a LoraConfig and call get_peft_model with SimpleModel as input. test_simple_lora_linear_model could be responsible for initializing the class and passes the instance to _test_model. This way, we can be 100% that this generates a model the same way that users typically do.
| super(SimpleLorAEmbeddingModel, self).__init__() | ||
|
|
||
| self.embedding_layer_base = nn.Embedding(1000, 768) | ||
| self.embedding_layer = LoraEmbedding( |
| self.embedding_layer = nn.Embedding(1000, 768) | ||
| self.layer_norm = nn.LayerNorm(768) | ||
| self.conv2d_transform_base = nn.Conv2d(1, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) | ||
| self.conv2d_transform = LoraConv2d( |
|
|
||
| @require_torch_gpu | ||
| class TestAutoCast(unittest.TestCase): | ||
| def test_simple_model(self): |
There was a problem hiding this comment.
WDYT about parametrizing the test (using parameterize, see other tests) over the dtype? That way, we can run a single test per test case, which is usually preferable.
| # Prepare dummy inputs | ||
| input_ids = torch.randint(0, 1000, (2, 10)).cuda() | ||
|
|
||
| # Forward pass with torch.bfloat16 |
There was a problem hiding this comment.
For the bf16 case, can we please run self.skipTest if not torch.cuda.is_bf16_supported()?
|
@MFajcik Do you still plan to work on this? :) |
Hopefully this month sometimes :) |
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
|
@BenjaminBossan I think I adressed all your comments now. |
BenjaminBossan
left a comment
There was a problem hiding this comment.
Thanks a lot @MFajcik for providing the fix and adding tests for it. From my point of view, this looks good, but I'd like to have another review by @younesbelkada or @pacman100 to be 100% sure.
I saw that other methods probably need to be changed in the same way as done in this PR (LoHa, LoKr, poly, oft, IA³, lora.tp_layer.LoraParallelLinear), but that can be done in a subsequent PR.
pacman100
left a comment
There was a problem hiding this comment.
Thank you @MFajcik for the fixes and @BenjaminBossan for all the discussions and suggestions!
|
@MFajcik I think we should be good to merge this PR once the merge conflict is resolved. It should be pretty straightforward, let me know if there are any questions. Please note that we now use normal |
# Conflicts: # tests/test_gpu_examples.py
|
Could you please run |
Fixes the issue from #971