Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CORE: improve logging #721

Merged
merged 3 commits into from
Mar 6, 2023

Conversation

Sergei-Lebedev
Copy link
Contributor

What

Improve logging in different components of UCC

Why ?

Currently there is no unified rules between TLs, CLs, CORE about what to print and how to print.

How ?

  1. Make LOG_LEVE=info less verbose, print only essential info about UCC runtime including
    1.1. UCC version
    1.2. UCC runtime path (path to the libucc.so)
    1.3. global config path if used or n/a otherwise
    1.4. team create log with team id and team size
    1.5. score map i.e. what collectives available, what memory types available, what TLs/CLs are used
    1.6. team destroy log
  2. Added UCC_COLL_TRACE (default is no)
    2.1. Leader rank prints collective info on each ucc_collective_init if UCC log level is info
    2.2. All ranks from team print collective info on ucc_collective_init if UCC log level is debug or above
    2.3. For LOG_LEVEL=info collective info includes, src and dst buffer address, count, datatype, memory type if available; inplace flag if set; root for rooted collectives; operation for reduction collectives; chosen TL/CL; team id. For V collectives (allgatherv, alltoallv, scatterv ...) count is sum of corresponding values in counts buffer.
    2.4. For LOG_LEVEL=debug collective info output additionally includes team rank number, context rank number, collective sequence number, pointer to request

All other prints moved from info log level to debug log level.

UCC_LOG_LEVEL=info example
ucc_constructor.c:183  UCC  INFO  version: 1.2.0, loaded from: /home/ucc/build-debug/install/lib/libucc.so.1, cfg file: n/a
ucc_constructor.c:183  UCC  INFO  version: 1.2.0, loaded from: /home/ucc/build-debug/install/lib/libucc.so.1, cfg file: n/a
       ucc_team.c:471  UCC  INFO  ===== COLL_SCORE_MAP (team_id 32768, size 2) =====
ucc_coll_score_map.c:201  UCC  INFO  Allgather:
ucc_coll_score_map.c:201  UCC  INFO     Host: {0..inf}:TL_UCP:10
ucc_coll_score_map.c:201  UCC  INFO     Cuda: {0..inf}:TL_UCP:10
ucc_coll_score_map.c:201  UCC  INFO     CudaManaged: {0..inf}:TL_UCP:10
ucc_coll_score_map.c:201  UCC  INFO  Allgatherv:
ucc_coll_score_map.c:201  UCC  INFO     Host: {0..inf}:TL_UCP:10
ucc_coll_score_map.c:201  UCC  INFO     Cuda: {0..inf}:TL_UCP:10
ucc_coll_score_map.c:201  UCC  INFO     CudaManaged: {0..inf}:TL_UCP:10
ucc_coll_score_map.c:201  UCC  INFO  Allreduce:
ucc_coll_score_map.c:201  UCC  INFO     Host: {0..4095}:TL_UCP:10 {4K..inf}:TL_UCP:10
ucc_coll_score_map.c:201  UCC  INFO     Cuda: {0..4095}:TL_UCP:10 {4K..inf}:TL_UCP:10
ucc_coll_score_map.c:201  UCC  INFO     CudaManaged: {0..4095}:TL_UCP:10 {4K..inf}:TL_UCP:10
ucc_coll_score_map.c:201  UCC  INFO  Alltoall:
ucc_coll_score_map.c:201  UCC  INFO     Host: {0..inf}:TL_UCP:10
ucc_coll_score_map.c:201  UCC  INFO     Cuda: {0..inf}:TL_UCP:10
ucc_coll_score_map.c:201  UCC  INFO     CudaManaged: {0..inf}:TL_UCP:10
ucc_coll_score_map.c:201  UCC  INFO  Alltoallv:
ucc_coll_score_map.c:201  UCC  INFO     Host: {0..inf}:TL_UCP:10
ucc_coll_score_map.c:201  UCC  INFO     Cuda: {0..inf}:TL_UCP:10
ucc_coll_score_map.c:201  UCC  INFO     CudaManaged: {0..inf}:TL_UCP:10
ucc_coll_score_map.c:201  UCC  INFO  Barrier:
ucc_coll_score_map.c:201  UCC  INFO     Host: {0..inf}:TL_UCP:10
ucc_coll_score_map.c:201  UCC  INFO     Cuda: {0..inf}:TL_UCP:10
ucc_coll_score_map.c:201  UCC  INFO     CudaManaged: {0..inf}:TL_UCP:10
ucc_coll_score_map.c:201  UCC  INFO  Bcast:
ucc_coll_score_map.c:201  UCC  INFO     Host: {0..32767}:TL_UCP:10 {32K..inf}:TL_UCP:10
ucc_coll_score_map.c:201  UCC  INFO     Cuda: {0..32767}:TL_UCP:10 {32K..inf}:TL_UCP:10
ucc_coll_score_map.c:201  UCC  INFO     CudaManaged: {0..32767}:TL_UCP:10 {32K..inf}:TL_UCP:10
ucc_coll_score_map.c:201  UCC  INFO  Fanin:
ucc_coll_score_map.c:201  UCC  INFO     Host: {0..inf}:TL_UCP:10
ucc_coll_score_map.c:201  UCC  INFO     Cuda: {0..inf}:TL_UCP:10
ucc_coll_score_map.c:201  UCC  INFO     CudaManaged: {0..inf}:TL_UCP:10
ucc_coll_score_map.c:201  UCC  INFO  Fanout:
ucc_coll_score_map.c:201  UCC  INFO     Host: {0..inf}:TL_UCP:10
ucc_coll_score_map.c:201  UCC  INFO     Cuda: {0..inf}:TL_UCP:10
ucc_coll_score_map.c:201  UCC  INFO     CudaManaged: {0..inf}:TL_UCP:10
ucc_coll_score_map.c:201  UCC  INFO  Gather:
ucc_coll_score_map.c:201  UCC  INFO     Host: {0..inf}:TL_UCP:10
ucc_coll_score_map.c:201  UCC  INFO     Cuda: {0..inf}:TL_UCP:10
ucc_coll_score_map.c:201  UCC  INFO     CudaManaged: {0..inf}:TL_UCP:10
ucc_coll_score_map.c:201  UCC  INFO  Gatherv:
ucc_coll_score_map.c:201  UCC  INFO     Host: {0..inf}:TL_UCP:10
ucc_coll_score_map.c:201  UCC  INFO     Cuda: {0..inf}:TL_UCP:10
ucc_coll_score_map.c:201  UCC  INFO     CudaManaged: {0..inf}:TL_UCP:10
ucc_coll_score_map.c:201  UCC  INFO  Reduce:
ucc_coll_score_map.c:201  UCC  INFO     Host: {0..inf}:TL_UCP:10
ucc_coll_score_map.c:201  UCC  INFO     Cuda: {0..inf}:TL_UCP:10
ucc_coll_score_map.c:201  UCC  INFO     CudaManaged: {0..inf}:TL_UCP:10
ucc_coll_score_map.c:201  UCC  INFO  Reduce_scatter:
ucc_coll_score_map.c:201  UCC  INFO     Host: {0..inf}:TL_UCP:10
ucc_coll_score_map.c:201  UCC  INFO     Cuda: {0..inf}:TL_UCP:10
ucc_coll_score_map.c:201  UCC  INFO     CudaManaged: {0..inf}:TL_UCP:10
ucc_coll_score_map.c:201  UCC  INFO  Reduce_scatterv:
ucc_coll_score_map.c:201  UCC  INFO     Host: {0..inf}:TL_UCP:10
ucc_coll_score_map.c:201  UCC  INFO     Cuda: {0..inf}:TL_UCP:10
ucc_coll_score_map.c:201  UCC  INFO     CudaManaged: {0..inf}:TL_UCP:10
ucc_coll_score_map.c:201  UCC  INFO  Scatterv:
ucc_coll_score_map.c:201  UCC  INFO     Host: {0..inf}:TL_UCP:10
ucc_coll_score_map.c:201  UCC  INFO     Cuda: {0..inf}:TL_UCP:10
ucc_coll_score_map.c:201  UCC  INFO     CudaManaged: {0..inf}:TL_UCP:10
       ucc_team.c:474  UCC  INFO  ================================================

OSU MPI All-to-All Personalized Exchange Latency Test v5.8
Size Avg Latency(us)
....
ucc_team.c:528 UCC INFO team destroyed, team_id 32768

UCC_COLL_TRACE=y example
ucc_coll.c:257  UCC  INFO  coll_init: Barrier; TL_UCP, team_id 32768
ucc_coll.c:257  UCC  INFO  coll_init: Alltoall: src={0x7f7f2d789000, 2097152, int8, Host}, dst={0x7f7f2d587000, 2097152, int8, Host}; TL_UCP, team_id 32768
ucc_coll.c:257  UCC  INFO  coll_init: Barrier; TL_UCP, team_id 32768
ucc_coll.c:257  UCC  INFO  coll_init: Reduce min root 0: src={0x7ffd97e4ebd0, 1, float64, Host}, dst={0x7ffd97e4ebe8, 1, float64, Host}; TL_UCP, team_id 32768
ucc_coll.c:257  UCC  INFO  coll_init: Reduce max root 0: src={0x7ffd97e4ebd0, 1, float64, Host}, dst={0x7ffd97e4ebe0, 1, float64, Host}; TL_UCP, team_id 32768
ucc_coll.c:257  UCC  INFO  coll_init: Reduce sum root 0: src={0x7ffd97e4ebd0, 1, float64, Host}, dst={0x7ffd97e4ebd8, 1, float64, Host}; TL_UCP, team_id 32768

@Sergei-Lebedev
Copy link
Contributor Author

@jirikraus please take a look

@Sergei-Lebedev
Copy link
Contributor Author

bot:retest

@shimmybalsam
Copy link
Collaborator

General: in the print you show from UCC_LOG_LEVEL=info example there is no CL information, only TL score map. perhaps chosen CL info should be added as well.

@@ -30,6 +30,13 @@ ucc_config_field_t ucc_global_config_table[] = {
ucc_offsetof(ucc_global_config_t, log_component),
UCC_CONFIG_TYPE_LOG_COMP},

{"COLL_TRACE", "n",
"If UCC logging level is INFO or higher UCC will print info message"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a bit confusing since you're explaining what would happen in case of "y" but defining as "n"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is it confusing? by default coll trace is disabled because it produces a lot of output. If you want to see it just set UCC_COLL_TRACE=y.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion it would be more clear like this:
"If enabled and UCC logging level is INFO or higher UCC will print info message per collective.\n"

@jirikraus
Copy link
Contributor

Thanks @Sergei-Lebedev looking at the example output this looks great. I can't comment on the source change but assume this is not requested. As @shimmybalsam points out I think it makes sense to also cover the choosen CL.

Copy link
Collaborator

@vspetrov vspetrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change is awesome! I have just minor comments.
Why not keeping coll seq_num on the trace with INFO level. the line will be almost same length? It gives a lot of info on how heavily the team is used (if seq_num is large or not), etc.

Looks like we don't have trace for triggered post, lets add.

A couple of next steps we could as a follow up:

  • Add UCC_TL_COLL_TRACE
    basically same tracing but at TL level. This var should of course NOT inherit from UCC_COLL_TRACE. It would be convenient to trace in detail actual algorithms start/finalize at TL level.

  • With existing tracing, what else do we need to add to logging to potentially "replay" the trace ? This could be a very handy tool for perf analysis

}
break;
case UCC_COLL_TYPE_ALLGATHERV:
case UCC_COLL_TYPE_REDUCE_SCATTERV:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe move thise entire blob into separate util function:

static inline void ucc_coll_args_get_info(const ucc_coll_args_t *args, ucc_coll_buffer_info_t *s_info, 
ucc_coll_buffer_info_t *d_info) {
...
}

it can set s_info.buffer or r_info.buffer to NULL that would indicate the same as has_src/has_dst flags.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s_info.buffer == NULL is actually valid buffer, for example we did alltoallv with 64bit displacement in pytorch

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right. still worth moving to separate fn imho, even with 2 more flags. current function is very large. just from readability perspective.

@Sergei-Lebedev
Copy link
Contributor Author

Thanks @Sergei-Lebedev looking at the example output this looks great. I can't comment on the source change but assume this is not requested. As @shimmybalsam points out I think it makes sense to also cover the choosen CL.

It prints CL HIER if it's used and some TL if CL BASIC is used. The logic is that we print top level task info which is TL task for CL BASIC and CL task for CL HIER.

@Sergei-Lebedev
Copy link
Contributor Author

Why not keeping coll seq_num on the trace with INFO level. the line will be almost same length? It gives a lot of info on how heavily the team is used (if seq_num is large or not), etc.

Sure, no problem, will add.

Looks like we don't have trace for triggered post, lets add.

We print this info in coll_init so we will have it for triggered post as well. But maybe move it to collective post and triggered post instead? I'm not sure which option is better.

  • Add UCC_TL_COLL_TRACE
    basically same tracing but at TL level. This var should of course NOT inherit from UCC_COLL_TRACE. It would be convenient to trace in detail actual algorithms start/finalize at TL level.

Agree, but we have much more option where to add in TL. For example in every init, post, frag start ...

  • With existing tracing, what else do we need to add to logging to potentially "replay" the trace ? This could be a very handy tool for perf analysis

Yes, i wanted to start with some script that can collect different statistic, but replay is also an option.

@jirikraus
Copy link
Contributor

Thanks @Sergei-Lebedev looking at the example output this looks great. I can't comment on the source change but assume this is not requested. As @shimmybalsam points out I think it makes sense to also cover the choosen CL.

It prints CL HIER if it's used and some TL if CL BASIC is used. The logic is that we print top level task info which is TL task for CL BASIC and CL task for CL HIER.

I think that would address my point. However I am lacking background to figure how this might look like in practice. Can you provide an example how that would look? The example you provide above only cover TL_UCP not the CL used.

@Sergei-Lebedev
Copy link
Contributor Author

Thanks @Sergei-Lebedev looking at the example output this looks great. I can't comment on the source change but assume this is not requested. As @shimmybalsam points out I think it makes sense to also cover the choosen CL.

It prints CL HIER if it's used and some TL if CL BASIC is used. The logic is that we print top level task info which is TL task for CL BASIC and CL task for CL HIER.

I think that would address my point. However I am lacking background to figure how this might look like in practice. Can you provide an example how that would look? The example you provide above only cover TL_UCP not the CL used.

sure, it's similar:

ucc_coll.c:257  UCC  INFO  coll_init: Barrier; CL_HIER, team_id 1
ucc_coll.c:257  UCC  INFO  coll_init: Allreduce min: src={0x7ffd443b8238, 1, float64, Host}, dst={0x7ffd443b8228, 1, float64, Host}; CL_HIER, team_id 1
ucc_coll.c:257  UCC  INFO  coll_init: Allreduce max: src={0x7ffd443b8238, 1, float64, Host}, dst={0x7ffd443b8220, 1, float64, Host}; CL_HIER, team_id 1
ucc_coll.c:257  UCC  INFO  coll_init: Allreduce sum: src={0x7ffd443b8238, 1, float64, Host}, dst={0x7ffd443b8230, 1, float64, Host}; CL_HIER, team_id 1

@shimmybalsam
Copy link
Collaborator

Thanks @Sergei-Lebedev looking at the example output this looks great. I can't comment on the source change but assume this is not requested. As @shimmybalsam points out I think it makes sense to also cover the choosen CL.

It prints CL HIER if it's used and some TL if CL BASIC is used. The logic is that we print top level task info which is TL task for CL BASIC and CL task for CL HIER.

I think that would address my point. However I am lacking background to figure how this might look like in practice. Can you provide an example how that would look? The example you provide above only cover TL_UCP not the CL used.

sure, it's similar:

ucc_coll.c:257  UCC  INFO  coll_init: Barrier; CL_HIER, team_id 1
ucc_coll.c:257  UCC  INFO  coll_init: Allreduce min: src={0x7ffd443b8238, 1, float64, Host}, dst={0x7ffd443b8228, 1, float64, Host}; CL_HIER, team_id 1
ucc_coll.c:257  UCC  INFO  coll_init: Allreduce max: src={0x7ffd443b8238, 1, float64, Host}, dst={0x7ffd443b8220, 1, float64, Host}; CL_HIER, team_id 1
ucc_coll.c:257  UCC  INFO  coll_init: Allreduce sum: src={0x7ffd443b8238, 1, float64, Host}, dst={0x7ffd443b8230, 1, float64, Host}; CL_HIER, team_id 1

Looks good but I still think that also for CL BASIC we should output that we are using CL_BASIC and then which TLs

@jirikraus
Copy link
Contributor

Thanks @Sergei-Lebedev looking at the example output this looks great. I can't comment on the source change but assume this is not requested. As @shimmybalsam points out I think it makes sense to also cover the choosen CL.

It prints CL HIER if it's used and some TL if CL BASIC is used. The logic is that we print top level task info which is TL task for CL BASIC and CL task for CL HIER.

I think that would address my point. However I am lacking background to figure how this might look like in practice. Can you provide an example how that would look? The example you provide above only cover TL_UCP not the CL used.

sure, it's similar:

ucc_coll.c:257  UCC  INFO  coll_init: Barrier; CL_HIER, team_id 1
ucc_coll.c:257  UCC  INFO  coll_init: Allreduce min: src={0x7ffd443b8238, 1, float64, Host}, dst={0x7ffd443b8228, 1, float64, Host}; CL_HIER, team_id 1
ucc_coll.c:257  UCC  INFO  coll_init: Allreduce max: src={0x7ffd443b8238, 1, float64, Host}, dst={0x7ffd443b8220, 1, float64, Host}; CL_HIER, team_id 1
ucc_coll.c:257  UCC  INFO  coll_init: Allreduce sum: src={0x7ffd443b8238, 1, float64, Host}, dst={0x7ffd443b8230, 1, float64, Host}; CL_HIER, team_id 1

Looks good but I still think that also for CL BASIC we should output that we are using CL_BASIC and then which TLs

And as @vspetrov suggested add UCC_TL_COLL_TRACE to provide additional details. To make sure I understand this right when CL hier is used we won't see the TLs it leverages? Could we show that without getting too verbose, E.g.

ucc_coll.c:257  UCC  INFO  coll_init: Barrier; CL_HIER, team_id 1
ucc_coll.c:257  UCC  INFO  coll_init: Allreduce min: src={0x7ffd443b8238, 1, float64, Host}, dst={0x7ffd443b8228, 1, float64, Host}; CL_HIER ( comma seperated list of TLs levearge), team_id 1
ucc_coll.c:257  UCC  INFO  coll_init: Allreduce max: src={0x7ffd443b8238, 1, float64, Host}, dst={0x7ffd443b8220, 1, float64, Host}; CL_HIER ( comma seperated list of TLs levearge), team_id 1
ucc_coll.c:257  UCC  INFO  coll_init: Allreduce sum: src={0x7ffd443b8238, 1, float64, Host}, dst={0x7ffd443b8230, 1, float64, Host}; CL_HIER ( comma seperated list of TLs levearge, e.g. (TL_UCP, TL_SHARP)) , team_id 1

and something similar for CL Basic?

@vspetrov
Copy link
Collaborator

vspetrov commented Feb 1, 2023

We print this info in coll_init so we will have it for triggered post as well. But maybe move it to collective post and triggered post instead? I'm not sure which option is better.

right, i'm starting to forget stuff... i think, we should keep as is - print detailed info on collective_init and on post/finalize we have a verbose messages that will be printed under DEBUG. BTW, in post/finalize we print pointer of the request which makes it easy to match initialized request with the posted/finalized. I saw you no longer print the request pointer in coll_init. Maybe lets add it under DEBUG.

@Sergei-Lebedev
Copy link
Contributor Author

@jirikraus @vspetrov @shimmybalsam
I slightly changed the log for collective trace. Now UCC_COLL_TRACE is enum instead of binary value, which allows to finer control over verbosity level. Coll trace doesn't depend on main LOG_LEVEL anymore, examples

UCC_COLL_TRACE=diag: leader ranks prints info from coll_args structure (coll type, buffer pointer, size, datatype, memory type)

ucc_coll.c:255  UCC_COLL DIAG  coll_init: Barrier
ucc_coll.c:255  UCC_COLL DIAG  coll_init: Alltoall: src={0x8ae9310, 1024, float32, Host}, dst={0x88772d0, 1024, float32, Host}
ucc_coll.c:255  UCC_COLL DIAG  coll_init: Barrier
ucc_coll.c:255  UCC_COLL DIAG  coll_init: Alltoall: src={0x8ae9310, 1024, float32, Host}, dst={0x88772d0, 1024, float32, Host}
ucc_coll.c:255  UCC_COLL DIAG  coll_init: Barrier
ucc_coll.c:255  UCC_COLL DIAG  coll_init: Allreduce min: src={0x7ffe38c79368, 1, float64, Host}, dst={0x7ffe38c79358, 1, float64, Host}
ucc_coll.c:255  UCC_COLL DIAG  coll_init: Allreduce max: src={0x7ffe38c79368, 1, float64, Host}, dst={0x7ffe38c79350, 1, float64, Host}
ucc_coll.c:255  UCC_COLL DIAG  coll_init: Allreduce sum: src={0x7ffe38c79368, 1, float64, Host}, dst={0x7ffe38c79360, 1, float64, Host}

UCC_COLL_TRACE=info: same as log level diag but adds team_id and CL/TL being used

ucc_coll.c:255  UCC_COLL INFO  coll_init: Barrier; CL_HIER {TL_UCP, TL_SHARP}, team_id 1
ucc_coll.c:255  UCC_COLL INFO  coll_init: Alltoall: src={0x776d2d0, 1024, float32, Host}, dst={0x75012d0, 1024, float32, Host}; CL_BASIC {TL_UCP}, team_id 1
ucc_coll.c:255  UCC_COLL INFO  coll_init: Barrier; CL_HIER {TL_UCP, TL_SHARP}, team_id 1
ucc_coll.c:255  UCC_COLL INFO  coll_init: Alltoall: src={0x776d2d0, 1024, float32, Host}, dst={0x75012d0, 1024, float32, Host}; CL_BASIC {TL_UCP}, team_id 1
ucc_coll.c:255  UCC_COLL INFO  coll_init: Barrier; CL_HIER {TL_UCP, TL_SHARP}, team_id 1
ucc_coll.c:255  UCC_COLL INFO  coll_init: Allreduce min: src={0x7ffd253387f8, 1, float64, Host}, dst={0x7ffd253387e8, 1, float64, Host}; CL_HIER {TL_UCP, TL_SHARP}, team_id 1
ucc_coll.c:255  UCC_COLL INFO  coll_init: Allreduce max: src={0x7ffd253387f8, 1, float64, Host}, dst={0x7ffd253387e0, 1, float64, Host}; CL_HIER {TL_UCP, TL_SHARP}, team_id 1
ucc_coll.c:255  UCC_COLL INFO  coll_init: Allreduce sum: src={0x7ffd253387f8, 1, float64, Host}, dst={0x7ffd253387f0, 1, float64, Host}; CL_HIER {TL_UCP, TL_SHARP}, team_id 1

UCC_COLL_TRACE=debug: same as log level info but adds rank, ctx_rank, sequence number and reuqest pointer

ucc_coll.c:255  UCC_COLL DEBUG coll_init: Barrier; CL_HIER {TL_UCP, TL_SHARP}, team_id 1 rank 0, ctx_rank 0, seq_num 0, req 0x8a34440
ucc_coll.c:255  UCC_COLL DEBUG coll_init: Alltoall: src={0x8e632d0, 1024, float32, Host}, dst={0x8bf72d0, 1024, float32, Host}; CL_BASIC {TL_UCP}, team_id 1 rank 0, ctx_rank 0, seq_num 1, req 0x8bf6e00
ucc_coll.c:255  UCC_COLL DEBUG coll_init: Barrier; CL_HIER {TL_UCP, TL_SHARP}, team_id 1 rank 0, ctx_rank 0, seq_num 2, req 0x8a34440
ucc_coll.c:255  UCC_COLL DEBUG coll_init: Alltoall: src={0x8e632d0, 1024, float32, Host}, dst={0x8bf72d0, 1024, float32, Host}; CL_BASIC {TL_UCP}, team_id 1 rank 0, ctx_rank 0, seq_num 3, req 0x8bf7040
ucc_coll.c:255  UCC_COLL DEBUG coll_init: Barrier; CL_HIER {TL_UCP, TL_SHARP}, team_id 1 rank 0, ctx_rank 0, seq_num 4, req 0x8a34440
ucc_coll.c:255  UCC_COLL DEBUG coll_init: Allreduce min: src={0x7ffe3c4af258, 1, float64, Host}, dst={0x7ffe3c4af248, 1, float64, Host}; CL_HIER {TL_UCP, TL_SHARP}, team_id 1 rank 0, ctx_rank 0, seq_num 5, req 0x8a34440
ucc_coll.c:255  UCC_COLL DEBUG coll_init: Allreduce max: src={0x7ffe3c4af258, 1, float64, Host}, dst={0x7ffe3c4af240, 1, float64, Host}; CL_HIER {TL_UCP, TL_SHARP}, team_id 1 rank 0, ctx_rank 0, seq_num 6, req 0x8a34440
ucc_coll.c:255  UCC_COLL DEBUG coll_init: Allreduce sum: src={0x7ffe3c4af258, 1, float64, Host}, dst={0x7ffe3c4af250, 1, float64, Host}; CL_HIER {TL_UCP, TL_SHARP}, team_id 1 rank 0, ctx_rank 0, seq_num 7, req 0x8a34440

UCC_COLL_TRACE=trace is identical to UCC_COLL_TRACE=debug but all ranks print collective info
Note that list of TLs inside CL HIER depends on what rank prints it. If rank belongs to multiple topology subgroups it will use multiple TLs, but if rank is only part of one subgroup it will print single TL

@jirikraus
Copy link
Contributor

@jirikraus @vspetrov @shimmybalsam I slightly changed the log for collective trace. Now UCC_COLL_TRACE is enum instead of binary value, which allows to finer control over verbosity level. Coll trace doesn't depend on main LOG_LEVEL anymore, examples

Thanks this looks great. @Sergei-Lebedev when is this expected to land in a UCC release? I wonder how we should track adding a description of this to the user_guide.md discussed in PR #720 ?

@Sergei-Lebedev
Copy link
Contributor Author

@jirikraus @vspetrov @shimmybalsam I slightly changed the log for collective trace. Now UCC_COLL_TRACE is enum instead of binary value, which allows to finer control over verbosity level. Coll trace doesn't depend on main LOG_LEVEL anymore, examples

Thanks this looks great. @Sergei-Lebedev when is this expected to land in a UCC release? I wonder how we should track adding a description of this to the user_guide.md discussed in PR #720 ?

If everybody agrees on logging format, then this PR will be merged pretty soon. After it's merged you can update User Guide accordingly and I don't think it depends on release date. But answering your question we are going to release new UCC version in April.

@Sergei-Lebedev
Copy link
Contributor Author

@vspetrov @bureddy kind remainder, pls approve if you are ok with the changes

@Sergei-Lebedev Sergei-Lebedev merged commit 751c80c into openucx:master Mar 6, 2023
@Sergei-Lebedev Sergei-Lebedev deleted the topic/better_log branch March 6, 2023 06:03
janjust pushed a commit to janjust/ucc that referenced this pull request Jan 31, 2024
* CORE: improve logging

* CORE: add coll_trace loggger

* CORE: add post and finalize to debug log
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants