-
Notifications
You must be signed in to change notification settings - Fork 54
Use of FP16 in backward with create_graph = True? #22
Copy link
Copy link
Open
Description
Hi
I have a quick question. For your transformer or any other application, have you used FP16 when getting gradients from a backward call? In the model I am working with, for any scale factor on the loss that I’ve tried, backward seems to give reasonable gradients when I don’t set create_graph to True. But when I do set it to true, while some of the gradients are the same as with it set to False, many others show up as nan’s. All seems OK when I use FP32 operations, but I’d like to get FP16’s advantages in GPU memory/speed.
Any suggestions you can provide would be appreciated!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels