-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CL/HIER allgatherv #1016
CL/HIER allgatherv #1016
Conversation
e0ac473
to
338ce5d
Compare
Mpi test results: $ mpirun \
-x UCC_CL_HIER_ALLGATHERV_TUNE=allgatherv:@node_split:inf \
-x UCC_CLS=basic,hier \
-x UCC_CL_HIER_NODE_LEADERS_SBGP_TLS=ucp \
-x UCC_CL_HIER_NODE_SBGP_TLS=ucp \
-x LD_PRELOAD=./lib/libucc.so \
-x PATH \
-x LD_LIBRARY_PATH \
-np 6 --map-by ppr:3:node \
./bin/ucc_test_mpi -c allgatherv -t world -d uint32 -v -m 8 -v
===== UCC MPI TEST INFO =======
seed: 6222
collectives: Allgatherv
data types: uint32
memory types: Host
teams: world
tc=Allgatherv team=world mtype=host msgsize=8 persistent=0 inplace=0 dt=uint32
===== UCC MPI TEST REPORT =====
collective tests passed failed skipped
Allgatherv 1 1 0 0
===== UCC MPI TEST SUMMARY =====
total tests: 1
passed: 1
skipped: 0
failed: 0
elapsed: 9s
|
@@ -2,6 +2,10 @@ | |||
# Copyright (c) 2020-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update copyright year
@@ -0,0 +1,381 @@ | |||
/** | |||
* Copyright (c) 2022-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved. | |||
* Copyright (c) Meta Platforms, Inc. and affiliates. 2022. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rem
|
||
UCC_CL_HIER_PROFILE_REQUEST_EVENT(task, "cl_hier_allgatherv_finalize", 0); | ||
|
||
ucc_assert(cl_schedule->super.super.n_tasks <= 3); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<= MAX_ALLGATHERV_TASKS
} | ||
|
||
// Question: Do i need this function ? | ||
ucc_status_t ucc_cl_hier_allgatherv_triggered_post_setup(ucc_coll_task_t *task) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not needed
ucc_coll_task_t *tasks[MAX_ALLGATHERV_TASKS] = {NULL}; | ||
int n_tasks = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
initialized vars go first
frag_p, n_frags); | ||
} | ||
|
||
static ucc_status_t ucc_cl_hier_allgatherv_node_split_frag_setup( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not needed, no pipelining
|
||
if (c64 ^ d64) { | ||
cl_debug(team->context->lib, | ||
"mixed 64 bit count/displ mode is not supported\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rem \n
if (n_frags > 1) { | ||
args.max_frag_count = | ||
ucc_buffer_block_count(args.args.src.info.count, n_frags, 0); | ||
args.mask |= UCC_BASE_CARGS_MAX_FRAG_COUNT; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not needed, no pipelining
args.mask |= UCC_BASE_CARGS_MAX_FRAG_COUNT; | ||
} | ||
|
||
full_size = cl_team->sbgps[UCC_HIER_SBGP_FULL].sbgp->group_size; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think UCC_CL_TEAM_SIZE is better, no need to create FULL sbgp
|
||
// Gatherv in the node | ||
// src.info.buffer -> dst.info_v.buffer | ||
if (node_size > 1){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing space before {
((uint32_t *)gv_rc)[_i] = _scount; | ||
((uint32_t *)gv_displ)[_i] = _displ; | ||
|
||
_displ += _scount; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This first gatherv call is problematic if the displacements are not continuous (i.e if displ[i] != sum(counts[:i])
), for example say we have those three src buffers:
rank1: [1,2]
rank2: [3]
rank3: [4,5]
And displacements=[2,5,6]
Say that for every rank dst buffer points to this array: [10, 10, 0, 0, 10, 0, 0, 0]
Thus, in the end it should be: [10, 10, 1, 2, 10, 3, 4, 5]
But here because we are writing directly to the dst buffer, and because we are "stacking" each of the three buffers to the beginning of the buf, we are going to erase data of dst buf.
We can either decide not to support it or allocate a separate buffer specifically for this gatherv call.
@janjust do you know if/how this is handled in the hcoll alg ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add gtest
} | ||
} while (0); | ||
|
||
args.args.coll_type = UCC_COLL_TYPE_GATHERV; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in case of inplace allgatherv args.src.info is ignored, but for gatherv args.src.info is ignored only at root. Need to set args.src.info for non root ranks
status); | ||
|
||
// The result of this collective is in dst.info_v.buffer, this should be the source buffer of the next collective | ||
args.args.src.info.buffer = args.args.dst.info_v.buffer; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to use inplace allgatherv on second step if node size > 1
} | ||
} while (0); | ||
|
||
args.args.coll_type = UCC_COLL_TYPE_GATHERV; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check node sbgp enabled
n_tasks++; | ||
} | ||
|
||
// Allgatherv in the full net |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check node leaders group is enabled
|
||
int _hid, _count, _displ; | ||
/* For every rank in the communicator, | ||
* - find his host id (which is also the index of the leader -- Sergey ?) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in general case host_id != node_leader_rank
|
||
// BCAST in the node | ||
// src.info.buffer -> src.info.buffer | ||
if (node_size > 1){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is sbgp_enabled
status); | ||
|
||
// The result of this collective is in dst.info_v.buffer, this should be the source buffer of the next collective | ||
args.args.src.info.buffer = args.args.dst.info_v.buffer; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in case of allgatherv need to set src.info.datatype and src.info.mem_type
} | ||
|
||
// This collective writes to src.info.buffer, this should be the output buffer (dst.info_v.buffer) | ||
args.args.dst.info_v.buffer = args.args.src.info.buffer; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not needed, src.info.buffer == dst.info_v.buffer
|
||
// This collective writes to src.info.buffer, this should be the output buffer (dst.info_v.buffer) | ||
args.args.dst.info_v.buffer = args.args.src.info.buffer; | ||
n_tasks++; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not used
@x41lakazam is this still draft? |
Moved to #1050 |
allgatherv algorithm split into three parts:
Still under development. Currently supports only int32, I will add support for int64 once I'm sure of the logic.
Situation where the ranks are not in a continuous order across nodes is not handled yet.