[OTX] Enable multi-GPU training#1392
Conversation
916da4e to
a67a36b
Compare
a2d5e2d to
a7d48b0
Compare
ef4eadd to
10967d0
Compare
jaegukhyun
left a comment
There was a problem hiding this comment.
Generally it looks good to me. Frankly of speaking I'm not expert of multiprocessing, so let's check the test results. I left some comment, actually most of them are questions. Also I have two questions for overall PR.
- Does using multiprocessing give more benefits than using nn.parallel.DistributedDataParallel? I have read this article
- Don't we need stressed test for multiprocessing? Multiprocessing works always looks good if they run small amount(training schedule, # of jobs), but when the volume of works grow, they give error out of sudden.
Thanks for comment! I think I can answer your questions. Answer1 : You're right. So I implement to use nn.parallel.DistributedDataParallel. |
To the answer2, I'm not sure, but I think we should check whether multi-gpu training can run multiple training jobs
|
I understand. I'll check the situation in the local environment, and I'll update result. |
|
@jaegukhyun I checked that 4 multi GPU training worked well parallelly. |
harimkang
left a comment
There was a problem hiding this comment.
I left a few comments.
sungmanc
left a comment
There was a problem hiding this comment.
Please replace print sentence with logger and also left some comments
8b39ba2 to
5587d32
Compare
|
There are two fail cases in Pre-Merge Check as bellow. These are due to gap between exported model performance and trained model performance. |
JihwanEom
left a comment
There was a problem hiding this comment.
Can we use also 3 GPUs or 4 GPUs?
yes |
Summary
Enable multi-GPU training for classification, detection, segmentation tasks.
This PR includes