Skip to content

Cuda tried allocating an enormous amount of memory (1936GiB) #10528

@BilboBaguette

Description

@BilboBaguette

Search before asking

  • I have searched the YOLOv5 issues and found no similar bug report.

YOLOv5 Component

Training

Bug

Hi, new to YOLO, I am getting this error message when training YOLOv5x6:

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1936.00 GiB (GPU 1; 11.17 GiB total capacity; 2.15 GiB already allocated; 7.62 GiB free; 3.02 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

For context, I am trying to train YOLOv5x6 on the PubLayNet datatset (article here, github here) to compare the results with the DocLayNet dataset that has already been tested on YOLOv5x6 (article here, github here)

I am doing this using base image size of 640, batch size of 8 and running distributed data parallel mode on 2 K80 GPUs.
During training, memory usage is normal as can be seen below:

image
image

But about 80% through the first epoch I get the above error message. Any clues as to why the model would try to allocate such an enormous amount of memory and how to fix it ?

Environment

  • Yolo : Yolov5x6
  • OS : Ubuntu 20.04
  • Python : 3.9.12

Minimal Reproducible Example

No response

Additional

No response

Are you willing to submit a PR?

  • Yes I'd like to help by submitting a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    StaleStale and schedule for closing soonbugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions