-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
port improvements from vchuravy/NCCL.jl #5
Conversation
end | ||
uid = MPI.bcast(uid, 0, comm)::NCCL.UniqueID | ||
|
||
dev = CuDevice(parse(Int, first(split(ENV["CUDA_VISIBLE_DEVICES"], ",")))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "right" way of doing this is:
lcomm = MPI.Comm_split_type(mpicomm, MPI.MPI_COMM_TYPE_SHARED,
MPI.Comm_rank(mpicomm))
CUDAnative.device!(MPI.Comm_rank(lcomm))
For NCCL to work best all devices need to be visible to all MPI processes and there should only be a maxium of n local ranks per n devices
@@ -2,6 +2,12 @@ | |||
|
|||
export Allreduce!, Broadcast!, Reduce!, Allgather!, ReduceScatter! | |||
|
|||
function allReduce!(::Op, sendbuf, recvbuf, comm::Communicator; stream=CUDAdrv.CuDefaultStream()) where Op |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following the MPI.jl convention to lowercase Julia specific implementations
Interesting
and ensuing
|
859e000
to
68aebf8
Compare
Most of the functionality here has been included already, so I think we can close this. |
TODO