Skip to content

Use of FP16 in backward with create_graph = True? #22

@sjscotti

Description

@sjscotti

Hi
I have a quick question. For your transformer or any other application, have you used FP16 when getting gradients from a backward call? In the model I am working with, for any scale factor on the loss that I’ve tried, backward seems to give reasonable gradients when I don’t set create_graph to True. But when I do set it to true, while some of the gradients are the same as with it set to False, many others show up as nan’s. All seems OK when I use FP32 operations, but I’d like to get FP16’s advantages in GPU memory/speed.
Any suggestions you can provide would be appreciated!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions