-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TL/CUDA: fix linear algorithms #751
TL/CUDA: fix linear algorithms #751
Conversation
b5c71d1
to
51cada1
Compare
cudaGraphGetNodes(ee_task->graph, nodes, &num_nodes); | ||
for (i = 0; i < task_args->copy_multi.num_vectors; i++) { | ||
status = CUDA_FUNC( | ||
cudaGraphExecMemcpyNodeSetParams1D(ee_task->graph_exec, nodes[i], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: zero-length operations are not supported, so maybe skip when counts[i]=0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't skip it since all nodes of the graphs should be valid
@@ -19,8 +20,18 @@ ucc_pt_op_memcpy::ucc_pt_op_memcpy(ucc_datatype_t dt, ucc_memory_type mt, | |||
has_range_ = true; | |||
has_bw_ = true; | |||
|
|||
if (nbufs == UCC_PT_DEFAULT_N_BUFS) { | |||
nbufs = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe define "1" in a macro ?
Same for the constants used in reduce and reduce_strided
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's used in one place and default is different for different tests, can you pls elaborate what do you want to change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what I suggest is to do something along the line of:
in tools/perf/ucc_pt_config.h
#define UCC_PT_DEFAULT_N_BUFS 0
#define UCC_PT_DEFAULT_N_BUFS_MEMCPY 1
#define UCC_PT_DEFAULT_N_BUFS_REDUCE 1
#define UCC_PT_DEFAULT_N_BUFS_REDUCE_STRIDED 2
and then to use those variables in the different files instead of the hardcoded numbers.
If you think this is irrelevant please discard this suggestion
51cada1
to
0ed5edd
Compare
0ed5edd
to
1e1d387
Compare
* TL/CUDA: fix linear algorithms * REVIEW: fix review comments
What
Why ?
Fixes crashes on some NVLink topologies, improves performance of linear algorithms