Skip to content

Paged optimizer vs gradient checkpointing? #293

@LeoPerelli

Description

@LeoPerelli

Hello, i am somewhat confused by what paged optimizer obtains vs gradient checkpointing. Specifically, I was expecting one of the two by itself to be sufficient to avoid OOM errors. Indeed, if the paged optimizer is paging gradients, we shouldn't need gradient checkpoints anymore? However it does not appear to be so and the authors suggest using both together. Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions