Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
These are a few small changes I did when starting to run large GPU simulations. I will try and merge it back in reasonably sized chunks. Although there will be some more unpleasant PRs than this one in the future..
The changes to the NCCL communicator where needed so that only data that resides on the GPU is communicator using NCCL and MPI is used otherwise. Keep in mind that we still haven't figured out how to test in parallel on GPUs in the pipeline.