You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The test works with UCC_TLS=ucp, ie w/o TL_CUDA. The test creates multiple communicators in a tight loop:
for (i = 0; i < nLoop; i++) {
randval = rand();
if (randval % (rank + 2) == 0) {
MPI_Comm_split(MPI_COMM_WORLD, 1, rank, &newcomm);
MPI_Comm_free(&newcomm);
}
else {
MPI_Comm_split(MPI_COMM_WORLD, MPI_UNDEFINED, rank, &newcomm);
if (newcomm != MPI_COMM_NULL) {
errs++;
printf("Created a non-null communicator with MPI_UNDEFINED\n");
}
}
}
Each time MPI comm is created , the corresponding UCC team is created. If its size <= 8 and it is entirely within single node, then TL/CUDA team is created. The corresponding tl_cuda_team_destroy is correctly called for each MPI_Comm_free. But looks like some cuda memory is not released.
If i change ENABLE_RCACHE from 1 to 0 in tl_cuda_cache.c, the problem goes away.
Repro 2 nodes (vulcan):
Output:
The test works with UCC_TLS=ucp, ie w/o TL_CUDA. The test creates multiple communicators in a tight loop:
Each time MPI comm is created , the corresponding UCC team is created. If its size <= 8 and it is entirely within single node, then TL/CUDA team is created. The corresponding tl_cuda_team_destroy is correctly called for each MPI_Comm_free. But looks like some cuda memory is not released.
If i change ENABLE_RCACHE from 1 to 0 in tl_cuda_cache.c, the problem goes away.
@Sergei-Lebedev could you plz have a look
The text was updated successfully, but these errors were encountered: