Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EC/CPU: fix error message #844

Conversation

Sergei-Lebedev
Copy link
Contributor

What

Fix typo in error message

@Sergei-Lebedev
Copy link
Contributor Author

@edgargabriel could you please check why Linter-ROCM is failing? This is not specific for this PR, but happened multiple time in different PRs

@edgargabriel
Copy link
Contributor

@Sergei-Lebedev it seems to me that we are running into the same issue with UCX that we had with UCC and was fixed in PR #804 . The hip header file generates a warning when compiled with clang, which in itself is harmless, but because all warnings are treated as errors, the compilation fails. I will see what we can do about it. One option would be to compile UCX with gcc instead of clang (since we are not truly interested in running clang-tidy on ucx in this CI, but just on UCC)

@edgargabriel
Copy link
Contributor

@Sergei-Lebedev it is more complicated: compiling UCX with gcc instead of clang fixes the first issue. There is however another issue in that we need clang-14 to compile UCC for the rocm/rccl components. With clang-12 we ran into another issue in the hip header files where a flag to disable some warnings is not recognized (in the hip header files) and generates a warning again:

In file included from /opt/rocm/include/hip/hip_runtime_api.h:483:
/opt/rocm/include/hip/surface_types.h:30:34: error: unknown warning group '-Wreserved-identifier', ignored [-Werror,-Wunknown-warning-option]
#pragma clang diagnostic ignored "-Wreserved-identifier"

With clang-14 this error is not observed, and UCC compiles correctly. However clang-tidy with clang-14 fails, see

https://github.com/openucx/ucc/actions/runs/6264142779/job/17010173739?pr=846

Not sure what our options are. If you want I can convert Clang-tidy-rocm into a rocm compilation only (without the clang-tidy part), which would also still have some value to ensure that a pr is compiled also against the rocm components. Let me know what you suggest.

@edgargabriel
Copy link
Contributor

edgargabriel commented Sep 21, 2023

the problem is of course triggered by the release of rocm-5.7 a few days ago which contain the changes to the hip header files that are causing the issue. The script fetches the most recent rocm version. I can also explore whether I could fetch a fixed rocm version instead, hence go back to rocm-5.6 for example.

@edgargabriel
Copy link
Contributor

#846 fixes the ROCM-linter compilation issue, by using rocm 5.6.1 (instead of the newest version), and compiling ucx with gcc instead of clang.

@Sergei-Lebedev Sergei-Lebedev force-pushed the topic/fix_ec_cpu_reduce_error_msg branch from 5f4c8f6 to a4db69c Compare September 28, 2023 07:13
@Sergei-Lebedev Sergei-Lebedev enabled auto-merge (squash) September 28, 2023 07:13
@Sergei-Lebedev Sergei-Lebedev merged commit 34ad3ef into openucx:master Sep 28, 2023
7 of 9 checks passed
nsarka pushed a commit to nsarka/ucc that referenced this pull request Oct 24, 2023
nsarka pushed a commit to nsarka/ucc that referenced this pull request Oct 24, 2023
janjust pushed a commit to janjust/ucc that referenced this pull request Jan 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants