-
Notifications
You must be signed in to change notification settings - Fork 21
Distributed training problem #2
Copy link
Copy link
Open
Labels
questionFurther information is requestedFurther information is requested
Description
you use nccl in the distributed training, my problem is do you use nccl in pytorch or do you install nccl
seperately?And how do you set your environment variable?I am queite confused about it.Thanks very much!I meet the following problem when i use two machine to run the code.
- INFO NET/Plugin : No plugin found (libnccl-net.so)
2.NCCL error in: /pytorch/torch/lib/c10d/ProcessGroupNCCL.cpp:400, unhandled cuda error.
3.NCCL INFO NET/IB : No device found
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
questionFurther information is requestedFurther information is requested