-
-
Notifications
You must be signed in to change notification settings - Fork 17.4k
Description
Search before asking
- I have searched the YOLOv5 issues and found no similar bug report.
YOLOv5 Component
Training
Bug
Hi, new to YOLO, I am getting this error message when training YOLOv5x6:
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1936.00 GiB (GPU 1; 11.17 GiB total capacity; 2.15 GiB already allocated; 7.62 GiB free; 3.02 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
For context, I am trying to train YOLOv5x6 on the PubLayNet datatset (article here, github here) to compare the results with the DocLayNet dataset that has already been tested on YOLOv5x6 (article here, github here)
I am doing this using base image size of 640, batch size of 8 and running distributed data parallel mode on 2 K80 GPUs.
During training, memory usage is normal as can be seen below:
But about 80% through the first epoch I get the above error message. Any clues as to why the model would try to allocate such an enormous amount of memory and how to fix it ?
Environment
- Yolo : Yolov5x6
- OS : Ubuntu 20.04
- Python : 3.9.12
Minimal Reproducible Example
No response
Additional
No response
Are you willing to submit a PR?
- Yes I'd like to help by submitting a PR!

