-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NvMon segfault on stop_counters #34
Comments
FWIW, gdb backtrace
|
Perhaps related: When I try to use the GPU Marker API instead (see code below) I get the following error.
All output (with
Command: Julia File: # perfctr_gpu.jl
using LIKWID
using LinearAlgebra
using CUDA
@assert CUDA.functional()
const N = 10_000
const a = 3.141
# Note: CUDA defaults to Float32
const x = CUDA.rand(N)
const y = CUDA.rand(N)
const z = CUDA.zeros(N)
GPUMarker.init()
GPUMarker.startregion("saxpy")
for _ in 1:100
z .= a .* x .* y
end
GPUMarker.stopregion("saxpy")
GPUMarker.close() |
I get the same segfault (see first post) on an entirely different machine (our DGX-A100). To me, that seems to suggest that this is not system / installation related but actually a LIKWID bug. @TomTheBear what do you think? |
I also don't understand the
|
@TomTheBear With your PR the segfault is gone but, I believe, things still aren't working as they should.
|
Oh, regarding point 2, it shows up under "FLOPS_DP", but why?! Update: Because julia> a = 3.141f0;
julia> events = @nvmon "FLOPS_SP" saxpy!(z, a, x, y);
Group: FLOPS_SP
┌────────────────────────────────────────────────────┬─────────┐
│ Event │ GPU 1 │
├────────────────────────────────────────────────────┼─────────┤
│ SMSP_SASS_THREAD_INST_EXECUTED_OP_FADD_PRED_ON_SUM │ 0.0 │
│ SMSP_SASS_THREAD_INST_EXECUTED_OP_FMUL_PRED_ON_SUM │ 0.0 │
│ SMSP_SASS_THREAD_INST_EXECUTED_OP_FFMA_PRED_ON_SUM │ 10000.0 │
└────────────────────────────────────────────────────┴─────────┘ |
With the latest fixes on the likwid PR this works again. Hope we'll get a new likwid release with this soon 😀 |
To reproduce / for copy-paste:
Not sure yet whether this is an issue with likwid, LIKWID.jl, or the likwid installation on Noctua 2.
(cc @TomTheBear)
The text was updated successfully, but these errors were encountered: