Skip to content

Commit

Permalink
DOCS: Added descr. of ucc.conf and UCC_COLL_TRACE
Browse files Browse the repository at this point in the history
  • Loading branch information
jirikraus committed Mar 13, 2023
1 parent d9b1937 commit 5a45c08
Showing 1 changed file with 32 additions and 0 deletions.
32 changes: 32 additions & 0 deletions docs/user_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -193,6 +193,8 @@ in the [UCC Wiki](https://github.com/openucx/ucc/wiki) for more information and
If for a given combination, multiple TLs have the same highest score, it is implementation-defined
which of those TLs with the highest score is selected.

Tuning UCC heuristics is also possible with the UCC configuration file (`ucc.conf`). This file provides a unified way of tailoring the behavior of UCC components - CLs, TLs, and ECs. It can contain any UCC variables of the format `VAR = VALUE`, e.g. `UCC_TL_NCCL_TUNE=allreduce:cuda:inf#alltoall:0` to force NCCL allreduce for "cuda" buffers and disable NCCL for alltoall. See [`contrib/ucc.conf`](../contrib/ucc.conf) for an example and the [FAQ](https://github.com/openucx/ucc/wiki/FAQ#13-ucc-configuration-file-and-priority) for further details.

## Logging

To detect if Open MPI leverages UCC for a given collective one can set `OMPI_MCA_coll_ucc_verbose=3` checking for output like
Expand Down Expand Up @@ -306,6 +308,36 @@ ucc_coll_score_map.c:185 UCC INFO RocmManaged: {0..4095}:TL_UCP:10 {4K..
[...] snip
```

The next release of UCC add the `UCC_COLL_TRACE` environment variable:

```
$ ucc_info -caf | grep -B6 UCC_COLL_TRACE
#
# UCC collective logging level. Higher level will result in more verbose collective info.
# Possible values are: fatal, error, warn, info, debug, trace, data, func, poll.
#
# syntax: [FATAL|ERROR|WARN|DIAG|INFO|DEBUG|TRACE|REQ|DATA|ASYNC|FUNC|POLL]
#
UCC_COLL_TRACE=WARN
```

With `UCC_COLL_TRACE=INFO` UCC reports for every collective which CL and TL has been selected:

```
$ UCC_COLL_TRACE=INFO srun ./c/mpi/collective/osu_allreduce -i 1 -x 0 -d cuda -m 1048576:1048576
# OSU MPI-CUDA Allreduce Latency Test v7.0
# Size Avg Latency(us)
[1678205653.808236] [node_name:903 :0] ucc_coll.c:255 UCC_COLL INFO coll_init: Barrier; CL_BASIC {TL_UCP}, team_id 32768
[1678205653.809882] [node_name:903 :0] ucc_coll.c:255 UCC_COLL INFO coll_init: Allreduce sum: src={0x7fc1f3a03800, 262144, float32, Cuda}, dst={0x7fc195800000, 262144, float32, Cuda}; CL_BASIC {TL_NCCL}, team_id 32768
[1678205653.810344] [node_name:903 :0] ucc_coll.c:255 UCC_COLL INFO coll_init: Barrier; CL_BASIC {TL_UCP}, team_id 32768
[1678205653.810582] [node_name:903 :0] ucc_coll.c:255 UCC_COLL INFO coll_init: Reduce min root 0: src={0x7ffef34d5898, 1, float64, Host}, dst={0x7ffef34d58b0, 1, float64, Host}; CL_BASIC {TL_UCP}, team_id 32768
[1678205653.810641] [node_name:903 :0] ucc_coll.c:255 UCC_COLL INFO coll_init: Reduce max root 0: src={0x7ffef34d5898, 1, float64, Host}, dst={0x7ffef34d58a8, 1, float64, Host}; CL_BASIC {TL_UCP}, team_id 32768
[1678205653.810651] [node_name:903 :0] ucc_coll.c:255 UCC_COLL INFO coll_init: Reduce sum root 0: src={0x7ffef34d5898, 1, float64, Host}, dst={0x7ffef34d58a0, 1, float64, Host}; CL_BASIC {TL_UCP}, team_id 32768
1048576 582.07
[1678205653.810705] [node_name:903 :0] ucc_coll.c:255 UCC_COLL INFO coll_init: Barrier; CL_BASIC {TL_UCP}, team_id 32768
```

## Known Issues

- For the CUDA and NCCL TL CUDA device dependent data structures are created when UCC
Expand Down

0 comments on commit 5a45c08

Please sign in to comment.