Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NvMon segfault on stop_counters #34

Closed
carstenbauer opened this issue Jun 9, 2022 · 7 comments
Closed

NvMon segfault on stop_counters #34

carstenbauer opened this issue Jun 9, 2022 · 7 comments
Labels
bug Something isn't working

Comments

@carstenbauer
Copy link
Member

carstenbauer commented Jun 9, 2022

julia> using CUDA

julia> CUDA.functional()
true

julia> using LIKWID

julia> LIKWID.gpusupport()
true

julia> NvMon.init([0])
true

julia> gid = NvMon.add_event_set("FLOPS_DP")
1

julia> NvMon.setup_counters(gid)
true

julia> NvMon.start_counters()
true

julia> NvMon.stop_counters()

signal (11): Segmentation fault
in expression starting at REPL[13]:1
nvmon_perfworks_stopCounters at /upb/departments/pc2/groups/pc2-mitarbeiter/bauerc/n2/easybuild/software/likwid-gpu/5.2.1-GCC-11.2.0/lib/
liblikwid.so (unknown line)
nvmon_stopCounters at /upb/departments/pc2/groups/pc2-mitarbeiter/bauerc/n2/easybuild/software/likwid-gpu/5.2.1-GCC-11.2.0/lib/liblikwid.
so (unknown line)
nvmon_stopCounters at /scratch/pc2-mitarbeiter/bauerc/devel/LIKWID.jl/src/LibLikwid.jl:1924 [inlined]
stop_counters at /scratch/pc2-mitarbeiter/bauerc/devel/LIKWID.jl/src/nvmon.jl:212
unknown function (ip: 0x1523ee1247b4)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2247 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2429
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1788 [inlined]
do_call at /buildworker/worker/package_linux64/build/src/interpreter.c:126
eval_value at /buildworker/worker/package_linux64/build/src/interpreter.c:215
eval_stmt_value at /buildworker/worker/package_linux64/build/src/interpreter.c:166 [inlined]
eval_body at /buildworker/worker/package_linux64/build/src/interpreter.c:587
jl_interpret_toplevel_thunk at /buildworker/worker/package_linux64/build/src/interpreter.c:731
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:885
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:830
eval_body at /buildworker/worker/package_linux64/build/src/interpreter.c:550
eval_body at /buildworker/worker/package_linux64/build/src/interpreter.c:516
jl_interpret_toplevel_thunk at /buildworker/worker/package_linux64/build/src/interpreter.c:731
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:885
jl_toplevel_eval_in at /buildworker/worker/package_linux64/build/src/toplevel.c:944
eval at ./boot.jl:373 [inlined]
eval_user_input at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/REPL/src/REPL.jl:150
repl_backend_loop at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/REPL/src/REPL.jl:246
start_repl_backend at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/REPL/src/REPL.jl:231
#run_repl#47 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/REPL/src/REPL.jl:364
run_repl at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.7/REPL/src/REPL.jl:351
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2247 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2429
#930 at ./client.jl:394
jfptr_YY.930_45169.clone_1 at /opt/software/pc2/EB-SW/software/JuliaHPC/1.7.2-fosscuda-2022a-linux-x86_64/lib/julia/sys.so (unknown line)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2247 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2429
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1788 [inlined]
jl_f__call_latest at /buildworker/worker/package_linux64/build/src/builtins.c:757
#invokelatest#2 at ./essentials.jl:716 [inlined]
invokelatest at ./essentials.jl:714 [inlined]
run_main_repl at ./client.jl:379
exec_options at ./client.jl:309
_start at ./client.jl:495
jfptr__start_38732.clone_1 at /opt/software/pc2/EB-SW/software/JuliaHPC/1.7.2-fosscuda-2022a-linux-x86_64/lib/julia/sys.so (unknown line)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2247 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2429
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1788 [inlined]
true_main at /buildworker/worker/package_linux64/build/src/jlapi.c:559
jl_repl_entrypoint at /buildworker/worker/package_linux64/build/src/jlapi.c:701
main at julia (unknown line)
__libc_start_main at /lib64/libc.so.6 (unknown line)
unknown function (ip: 0x400808)
Allocations: 17039085 (Pool: 17034285; Big: 4800); GC: 11
Segmentation fault (core dumped)

To reproduce / for copy-paste:

using LIKWID
LIKWID.gpusupport()
NvMon.init([0])
gid = NvMon.add_event_set("FLOPS_DP")
NvMon.setup_counters(gid)
NvMon.start_counters()
NvMon.stop_counters()

Not sure yet whether this is an issue with likwid, LIKWID.jl, or the likwid installation on Noctua 2.

(cc @TomTheBear)

@carstenbauer
Copy link
Member Author

FWIW, gdb backtrace

Thread 1 "julia" received signal SIGSEGV, Segmentation fault.
0x0000155527c82ba4 in nvmon_perfworks_stopCounters ()
   from /upb/departments/pc2/groups/pc2-mitarbeiter/bauerc/n2/easybuild/software/likwid-gpu/5.2.1-GCC-11.2.0/lib/liblikwid.so
(gdb) bt
#0  0x0000155527c82ba4 in nvmon_perfworks_stopCounters ()
   from /upb/departments/pc2/groups/pc2-mitarbeiter/bauerc/n2/easybuild/software/likwid-gpu/5.2.1-GCC-11.2.0/lib/liblikwid.so
#1  0x0000155527c84062 in nvmon_stopCounters ()
   from /upb/departments/pc2/groups/pc2-mitarbeiter/bauerc/n2/easybuild/software/likwid-gpu/5.2.1-GCC-11.2.0/lib/liblikwid.so
#2  0x0000155528fd152f in ?? ()
#3  0x0000000000007a5e in ?? ()
#4  0x0000155528fd1555 in ?? ()
#5  0x00007ffffffe80f0 in ?? ()
#6  0x0000155553f94e0a in _jl_invoke (world=140737488257063, mfunc=<optimized out>, nargs=0, args=0x7ffffffe8118, F=0x15553e2e1e40)
    at /buildworker/worker/package_linux64/build/src/gf.c:2247
#7  jl_apply_generic (F=<optimized out>, args=0x7ffffffe8118, nargs=<optimized out>)
    at /buildworker/worker/package_linux64/build/src/gf.c:2429
#8  0x0000155553fb3e96 in jl_apply (nargs=1, args=0x7ffffffe8110) at /buildworker/worker/package_linux64/build/src/julia.h:1788
#9  do_call (args=args@entry=0x15553e866138, nargs=nargs@entry=1, s=s@entry=0x7ffffffe83a0)
    at /buildworker/worker/package_linux64/build/src/interpreter.c:126
#10 0x0000155553fb390e in eval_value (e=e@entry=0x15553f01b130, s=s@entry=0x7ffffffe83a0)
    at /buildworker/worker/package_linux64/build/src/interpreter.c:215
#11 0x0000155553fb46d2 in eval_stmt_value (s=0x7ffffffe83a0, stmt=<optimized out>)
    at /buildworker/worker/package_linux64/build/src/interpreter.c:166
#12 eval_body (stmts=<optimized out>, s=s@entry=0x7ffffffe83a0, ip=1, ip@entry=0, toplevel=toplevel@entry=1)
    at /buildworker/worker/package_linux64/build/src/interpreter.c:587
#13 0x0000155553fb52f8 in jl_interpret_toplevel_thunk (m=m@entry=0x155534981760 <jl_system_image_data+443552>, src=0x155540cd5310)
    at /buildworker/worker/package_linux64/build/src/interpreter.c:731
#14 0x0000155553fd27a4 in jl_toplevel_eval_flex (m=m@entry=0x155534981760 <jl_system_image_data+443552>, e=<optimized out>,
    fast=fast@entry=1, expanded=expanded@entry=0) at /buildworker/worker/package_linux64/build/src/toplevel.c:885
#15 0x0000155553fd29e5 in jl_toplevel_eval_flex (m=m@entry=0x155534981760 <jl_system_image_data+443552>, e=e@entry=0x15553dcee2d0,
    fast=fast@entry=1, expanded=expanded@entry=0) at /buildworker/worker/package_linux64/build/src/toplevel.c:830
#16 0x0000155553fd450c in jl_toplevel_eval (m=m@entry=0x155534981760 <jl_system_image_data+443552>, v=v@entry=0x15553dcee2d0)
    at /buildworker/worker/package_linux64/build/src/toplevel.c:894
#17 0x0000155553fd462a in jl_toplevel_eval_in (m=0x155534981760 <jl_system_image_data+443552>, ex=0x15553dcee2d0)
    at /buildworker/worker/package_linux64/build/src/toplevel.c:944
#18 0x000015553428483b in eval () at boot.jl:373
#19 japi1_include_string_40536 () at loading.jl:1196
#20 0x0000155553f94e0a in _jl_invoke (world=31320, mfunc=<optimized out>, nargs=4, args=0x7ffffffe8af0,
    F=0x1555357556d0 <jl_system_image_data+14943248>) at /buildworker/worker/package_linux64/build/src/gf.c:2247
#21 jl_apply_generic (F=<optimized out>, args=0x7ffffffe8af0, nargs=<optimized out>)
    at /buildworker/worker/package_linux64/build/src/gf.c:2429
#22 0x000015553437435b in japi1__include_32082 () at loading.jl:1253
#23 0x0000155533e91c16 in japi1_include_36299 () at Base.jl:418
#24 0x0000155553f94e0a in _jl_invoke (world=31320, mfunc=<optimized out>, nargs=2, args=0x7ffffffe91a0,
    F=0x1555377780a0 <jl_system_image_data+48639456>) at /buildworker/worker/package_linux64/build/src/gf.c:2247
#25 jl_apply_generic (F=<optimized out>, args=0x7ffffffe91a0, nargs=<optimized out>)
--Type <RET> for more, q to quit, c to continue without paging--
    at /buildworker/worker/package_linux64/build/src/gf.c:2429
#26 0x00001555343fa64c in julia_exec_options_33549 () at client.jl:292
#27 0x0000155533eb40f8 in julia__start_38731 () at client.jl:495
#28 0x0000155533eb4269 in jfptr.start_38732.clone_1 () at client.jl:295
#29 0x0000155553f94e0a in _jl_invoke (world=31320, mfunc=<optimized out>, nargs=0, args=0x7ffffffea5b0,
    F=0x15553557c7c0 <jl_system_image_data+13006080>) at /buildworker/worker/package_linux64/build/src/gf.c:2247
#30 jl_apply_generic (F=<optimized out>, args=0x7ffffffea5b0, nargs=<optimized out>)
    at /buildworker/worker/package_linux64/build/src/gf.c:2429
#31 0x0000155553ff82d6 in jl_apply (nargs=1, args=0x7ffffffea5a8) at /buildworker/worker/package_linux64/build/src/julia.h:1788
#32 true_main (argc=<optimized out>, argv=<optimized out>) at /buildworker/worker/package_linux64/build/src/jlapi.c:559
#33 0x0000155553ff8c7d in jl_repl_entrypoint (argc=<optimized out>, argv=<optimized out>)
    at /buildworker/worker/package_linux64/build/src/jlapi.c:701
#34 0x00000000004007d9 in main (argc=<optimized out>, argv=<optimized out>)
    at /buildworker/worker/package_linux64/build/cli/loader_exe.c:42

@carstenbauer
Copy link
Member Author

Perhaps related: When I try to use the GPU Marker API instead (see code below) I get the following error.

CUDA cannot be found and initialized (cuInit failed).
ERROR - [./src/topology_gpu.c:topology_gpu_init:232] Cannot get number of devices from CUDA library
CUDA cannot be found and initialized (cuInit failed).
ERROR - [./src/topology_gpu.c:topology_gpu_init:232] Cannot get number of devices from CUDA library
Error init GPU Marker API.

All output (with -V 2):

DEBUG - [hwloc_init_cpuInfo:355] HWLOC CpuInfo Family 25 Model 1 Stepping 1 Vendor 0x0 Part 0x0 isIntel 0 numHWThreads 128 activeHWThreads 128
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 0 Thread 0 Core 0 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 1 Thread 0 Core 1 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 2 Thread 0 Core 2 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 3 Thread 0 Core 3 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 4 Thread 0 Core 4 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 5 Thread 0 Core 5 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 6 Thread 0 Core 6 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 7 Thread 0 Core 7 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 8 Thread 0 Core 8 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 9 Thread 0 Core 9 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 10 Thread 0 Core 10 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 11 Thread 0 Core 11 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 12 Thread 0 Core 12 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 13 Thread 0 Core 13 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 14 Thread 0 Core 14 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 15 Thread 0 Core 15 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 16 Thread 0 Core 16 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 17 Thread 0 Core 17 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 18 Thread 0 Core 18 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 19 Thread 0 Core 19 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 20 Thread 0 Core 20 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 21 Thread 0 Core 21 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 22 Thread 0 Core 22 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 23 Thread 0 Core 23 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 24 Thread 0 Core 24 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 25 Thread 0 Core 25 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 26 Thread 0 Core 26 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 27 Thread 0 Core 27 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 28 Thread 0 Core 28 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 29 Thread 0 Core 29 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 30 Thread 0 Core 30 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 31 Thread 0 Core 31 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 32 Thread 0 Core 32 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 33 Thread 0 Core 33 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 34 Thread 0 Core 34 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 35 Thread 0 Core 35 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 36 Thread 0 Core 36 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 37 Thread 0 Core 37 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 38 Thread 0 Core 38 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 39 Thread 0 Core 39 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 40 Thread 0 Core 40 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 41 Thread 0 Core 41 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 42 Thread 0 Core 42 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 43 Thread 0 Core 43 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 44 Thread 0 Core 44 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 45 Thread 0 Core 45 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 46 Thread 0 Core 46 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 47 Thread 0 Core 47 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 48 Thread 0 Core 48 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 49 Thread 0 Core 49 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 50 Thread 0 Core 50 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 51 Thread 0 Core 51 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 52 Thread 0 Core 52 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 53 Thread 0 Core 53 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 54 Thread 0 Core 54 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 55 Thread 0 Core 55 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 56 Thread 0 Core 56 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 57 Thread 0 Core 57 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 58 Thread 0 Core 58 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 59 Thread 0 Core 59 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 60 Thread 0 Core 60 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 61 Thread 0 Core 61 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 62 Thread 0 Core 62 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 63 Thread 0 Core 63 Die 0 Socket 0 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 64 Thread 0 Core 64 Die 0 Socket 1 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 65 Thread 0 Core 65 Die 0 Socket 1 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 66 Thread 0 Core 66 Die 0 Socket 1 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 67 Thread 0 Core 67 Die 0 Socket 1 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 68 Thread 0 Core 68 Die 0 Socket 1 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 69 Thread 0 Core 69 Die 0 Socket 1 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 70 Thread 0 Core 70 Die 0 Socket 1 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 71 Thread 0 Core 71 Die 0 Socket 1 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 72 Thread 0 Core 72 Die 0 Socket 1 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 73 Thread 0 Core 73 Die 0 Socket 1 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 74 Thread 0 Core 74 Die 0 Socket 1 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 75 Thread 0 Core 75 Die 0 Socket 1 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 76 Thread 0 Core 76 Die 0 Socket 1 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 77 Thread 0 Core 77 Die 0 Socket 1 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 78 Thread 0 Core 78 Die 0 Socket 1 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 79 Thread 0 Core 79 Die 0 Socket 1 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 80 Thread 0 Core 80 Die 0 Socket 1 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 81 Thread 0 Core 81 Die 0 Socket 1 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 82 Thread 0 Core 82 Die 0 Socket 1 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 83 Thread 0 Core 83 Die 0 Socket 1 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 84 Thread 0 Core 84 Die 0 Socket 1 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 85 Thread 0 Core 85 Die 0 Socket 1 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 86 Thread 0 Core 86 Die 0 Socket 1 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 87 Thread 0 Core 87 Die 0 Socket 1 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 88 Thread 0 Core 88 Die 0 Socket 1 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 89 Thread 0 Core 89 Die 0 Socket 1 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 90 Thread 0 Core 90 Die 0 Socket 1 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 91 Thread 0 Core 91 Die 0 Socket 1 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 92 Thread 0 Core 92 Die 0 Socket 1 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 93 Thread 0 Core 93 Die 0 Socket 1 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 94 Thread 0 Core 94 Die 0 Socket 1 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 95 Thread 0 Core 95 Die 0 Socket 1 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 96 Thread 0 Core 96 Die 0 Socket 1 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 97 Thread 0 Core 97 Die 0 Socket 1 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 98 Thread 0 Core 98 Die 0 Socket 1 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 99 Thread 0 Core 99 Die 0 Socket 1 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 100 Thread 0 Core 100 Die 0 Socket 1 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 101 Thread 0 Core 101 Die 0 Socket 1 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 102 Thread 0 Core 102 Die 0 Socket 1 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 103 Thread 0 Core 103 Die 0 Socket 1 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 104 Thread 0 Core 104 Die 0 Socket 1 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 105 Thread 0 Core 105 Die 0 Socket 1 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 106 Thread 0 Core 106 Die 0 Socket 1 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 107 Thread 0 Core 107 Die 0 Socket 1 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 108 Thread 0 Core 108 Die 0 Socket 1 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 109 Thread 0 Core 109 Die 0 Socket 1 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 110 Thread 0 Core 110 Die 0 Socket 1 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 111 Thread 0 Core 111 Die 0 Socket 1 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 112 Thread 0 Core 112 Die 0 Socket 1 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 113 Thread 0 Core 113 Die 0 Socket 1 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 114 Thread 0 Core 114 Die 0 Socket 1 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 115 Thread 0 Core 115 Die 0 Socket 1 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 116 Thread 0 Core 116 Die 0 Socket 1 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 117 Thread 0 Core 117 Die 0 Socket 1 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 118 Thread 0 Core 118 Die 0 Socket 1 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 119 Thread 0 Core 119 Die 0 Socket 1 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 120 Thread 0 Core 120 Die 0 Socket 1 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 121 Thread 0 Core 121 Die 0 Socket 1 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 122 Thread 0 Core 122 Die 0 Socket 1 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 123 Thread 0 Core 123 Die 0 Socket 1 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 124 Thread 0 Core 124 Die 0 Socket 1 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 125 Thread 0 Core 125 Die 0 Socket 1 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 126 Thread 0 Core 126 Die 0 Socket 1 inCpuSet 1
DEBUG - [hwloc_init_nodeTopology:564] HWLOC Thread Pool PU 127 Thread 0 Core 127 Die 0 Socket 1 inCpuSet 1
DEBUG - [hwloc_init_cacheTopology:785] HWLOC Cache Pool ID 0 Level 1 Size 32768 Threads 1
DEBUG - [hwloc_init_cacheTopology:785] HWLOC Cache Pool ID 1 Level 2 Size 524288 Threads 1
DEBUG - [hwloc_init_cacheTopology:785] HWLOC Cache Pool ID 2 Level 3 Size 33554432 Threads 8
--------------------------------------------------------------------------------
CPU name:	AMD EPYC 7763 64-Core Processor                
CPU type:	AMD K19 (Zen3) architecture
CPU clock:	2.45 GHz
CPU family:	25
CPU model:	1
CPU short:	zen3
CPU stepping:	1
CPU features:	FP MMX SSE SSE2 HTT MMX RDTSCP MONITOR SSSE FMA SSE4.1 SSE4.2 AES AVX RDRAND AVX2 RDSEED SSE3 
CPU arch:	x86_64
--------------------------------------------------------------------------------
NVMON GPU 0 compute capability:	8.0
NVMON GPU 0 short:		nvidia_gpu_cc_ge_7
NVMON GPU 1 compute capability:	8.0
NVMON GPU 1 short:		nvidia_gpu_cc_ge_7
NVMON GPU 2 compute capability:	8.0
NVMON GPU 2 short:		nvidia_gpu_cc_ge_7
NVMON GPU 3 compute capability:	8.0
NVMON GPU 3 short:		nvidia_gpu_cc_ge_7
--------------------------------------------------------------------------------
DEBUG - [nvmon_init:182] Device 0 runs with CUPTI Profiling API backend
DEBUG - [nvmon_perfworks_createDevice:787] link_perfworks_libraries in createDevice
DEBUG - [link_perfworks_libraries:375] LD_LIBRARY_PATH=/upb/departments/pc2/groups/pc2-mitarbeiter/bauerc/n2/easybuild/software/likwid-gpu/5.2.1-GCC-11.2.0/lib:/upb/departments/pc2/groups/pc2-mitarbeiter/bauerc/n2/easybuild/software/hwloc/2.7.1-GCC-11.2.0/lib:/opt/software/pc2/EB-SW/software/libpciaccess/0.16-GCCcore-11.2.0/lib:/opt/software/pc2/EB-SW/software/libxml2/2.9.10-GCCcore-11.2.0/lib:/opt/software/pc2/EB-SW/software/XZ/5.2.5-GCCcore-11.2.0/lib:/opt/software/pc2/EB-SW/software/JuliaHPC/1.7.2-fosscuda-2022a-linux-x86_64/lib:/opt/software/pc2/EB-SW/software/ScaLAPACK/2.1.0-gompic-2022a/lib:/opt/software/pc2/EB-SW/software/FFTW/3.3.10-gompic-2022a/lib:/opt/software/pc2/EB-SW/software/OpenBLAS/0.3.18-GCC-11.2.0/lib:/opt/software/pc2/EB-SW/software/OpenMPI/4.1.1-gcccuda-2022a/lib:/opt/software/pc2/EB-SW/software/PMIx/4.1.0-GCCcore-11.2.0/lib:/opt/software/pc2/EB-SW/software/libfabric/1.13.2-GCCcore-11.2.0/lib:/opt/software/pc2/EB-SW/software/UCX/1.11.2-gcccuda-2022a/lib:/opt/software/pc2/EB-SW/software/GDRCopy/2.1-gcccuda-2022a/lib:/opt/software/pc2/EB-SW/software/Check/0.15.2-GCCcore-11.2.0/lib:/opt/software/pc2/EB-SW/software/libevent/2.1.12-GCCcore-11.2.0/lib:/opt/software/pc2/EB-SW/software/OpenSSL/1.1/lib:/opt/software/pc2/EB-SW/software/numactl/2.0.14-GCCcore-11.2.0/lib:/opt/software/pc2/EB-SW/software/CUDA/11.6.0/nvvm/lib64:/opt/software/pc2/EB-SW/software/CUDA/11.6.0/extras/CUPTI/lib64:/opt/software/pc2/EB-SW/software/CUDA/11.6.0/lib:/opt/software/pc2/EB-SW/software/binutils/2.37-GCCcore-11.2.0/lib:/opt/software/pc2/EB-SW/software/zlib/1.2.11-GCCcore-11.2.0/lib:/opt/software/pc2/EB-SW/software/GCCcore/11.2.0/lib64:/opt/software/slurm/21.08.6/lib:/upb/departments/pc2/users/b/bauerc/.local/lib
DEBUG - [link_perfworks_libraries:376] CUDA_HOME=/opt/software/pc2/EB-SW/software/CUDA/11.6.0
DEBUG - [link_perfworks_libraries:491] Run cuInit
DEBUG - [link_perfworks_libraries:493] Run cuDeviceGetCount
DEBUG - [link_perfworks_libraries:498] Run cuDeviceGet
DEBUG - [link_perfworks_libraries:500] Run cuDeviceGetAttribute for major CC
DEBUG - [link_perfworks_libraries:502] Run cuDeviceGetAttribute for minor CC
DEBUG - [nvmon_perfworks_createDevice:806] Found 4 GPUs
DEBUG - [nvmon_perfworks_createDevice:814] Current GPU 0
DEBUG - [nvmon_perfworks_createDevice:836] Current GPU chip GA100
DEBUG - [nvmon_perfworks_createDevice:851] Create metric context for chip 'GA100'
DEBUG - [nvmon_perfworks_createDevice:853] Create metric context done
DEBUG - [nvmon_perfworks_createDevice:864] Create metric context getMetricNames
DEBUG - [nvmon_perfworks_createDevice:932] Destroy metric context getMetricNames
DEBUG - [nvmon_perfworks_createDevice:934] Destroy metric context
Executing: julia --project=. perfctr_gpu.jl
DEBUG - [nvmon_addEventSet:378] Allocating new group structure for group.
DEBUG - [nvmon_addEventSet:380] NVMON: Currently 1 groups of 2 active
DEBUG - [nvmon_addEventSet:424] Performance group for PerfWorks backend
DEBUG - [perfgroup_readGroup:871] Reading group FLOPS_SP from /upb/departments/pc2/groups/pc2-mitarbeiter/bauerc/n2/easybuild/software/likwid-gpu/5.2.1-GCC-11.2.0/share/likwid/perfgroups/nvidia_gpu_cc_ge_7/FLOPS_SP.txt
DEBUG - [nvmon_addEventSet:448] EventStr SMSP_SASS_THREAD_INST_EXECUTED_OP_FADD_PRED_ON_SUM:GPU0,SMSP_SASS_THREAD_INST_EXECUTED_OP_FMUL_PRED_ON_SUM:GPU1,SMSP_SASS_THREAD_INST_EXECUTED_OP_FFMA_PRED_ON_SUM:GPU2
DEBUG - [nvmon_addEventSet:463] Calling addevents
DEBUG - [nvmon_perfworks_addEventSet:1462] Add events to GPU device 0 with context 5427360
DEBUG - [perfworks_check_nv_context:552] Current context 5427360 DevContext 0
DEBUG - [perfworks_check_nv_context:568] Reuse context 5427360 for device 0
DEBUG - [nvmon_perfworks_addEventSet:1486] SMSP_SASS_THREAD_INST_EXECUTED_OP_FADD_PRED_ON_SUM
DEBUG - [nvmon_perfworks_addEventSet:1493] Adding real event smsp__sass_thread_inst_executed_op_fadd_pred_on.sum
DEBUG - [nvmon_perfworks_addEventSet:1486] SMSP_SASS_THREAD_INST_EXECUTED_OP_FMUL_PRED_ON_SUM
DEBUG - [nvmon_perfworks_addEventSet:1493] Adding real event smsp__sass_thread_inst_executed_op_fmul_pred_on.sum
DEBUG - [nvmon_perfworks_addEventSet:1486] SMSP_SASS_THREAD_INST_EXECUTED_OP_FFMA_PRED_ON_SUM
DEBUG - [nvmon_perfworks_addEventSet:1493] Adding real event smsp__sass_thread_inst_executed_op_ffma_pred_on.sum
DEBUG - [nvmon_perfworks_addEventSet:1515] Increase size of eventSet space on device 0
DEBUG - [nvmon_perfworks_addEventSet:1524] Filling eventset 0 on device 0
DEBUG - [nvmon_perfworks_createConfigImage:1259] Create config image for chip GA100
DEBUG - [nvmon_perfworks_getMetricRequests114:1010] Create scratch buffer for GA100 and 0x4223f80
DEBUG - [nvmon_perfworks_getMetricRequests114:1021] Init Metric evaluator
DEBUG - [nvmon_perfworks_getMetricRequests114:1099] Destroy Metric evaluator
DEBUG - [nvmon_perfworks_createConfigImage:1261] Create config image for chip GA100 with 3 metric requests
DEBUG - [nvmon_perfworks_createConfigImage:1324] Allocated 296 byte for configImage
DEBUG - [nvmon_perfworks_createConfigImage:1333] nvmon_perfworks_createConfigImage_out enter 0
DEBUG - [nvmon_perfworks_createConfigImage:1334] NVPW_RawMetricsConfig_Destroy
DEBUG - [nvmon_perfworks_createConfigImage:1336] NVPW_MetricsContext_Destroy
DEBUG - [nvmon_perfworks_createConfigImage:1353] nvmon_perfworks_createConfigImage returns 296
DEBUG - [nvmon_perfworks_getMetricRequests114:1010] Create scratch buffer for GA100 and (nil)
DEBUG - [nvmon_perfworks_getMetricRequests114:1021] Init Metric evaluator
DEBUG - [nvmon_perfworks_getMetricRequests114:1099] Destroy Metric evaluator
DEBUG - [nvmon_perfworks_createCounterDataPrefixImage:1408] Allocated 172 byte for configPrefixImage
DEBUG - [nvmon_perfworks_createCounterDataPrefixImage:1417] nvmon_perfworks_createCounterDataPrefixImage_out enter 0
DEBUG - [nvmon_perfworks_createCounterDataPrefixImage:1439] nvmon_perfworks_createCounterDataPrefixImage returns 172
DEBUG - [nvmon_perfworks_addEventSet:1548] Filling eventset 0 on device 0
DEBUG - [nvmon_perfworks_addEventSet:1573] Adding eventset 0
--------------------------------------------------------------------------------
CUDA cannot be found and initialized (cuInit failed).
ERROR - [./src/topology_gpu.c:topology_gpu_init:232] Cannot get number of devices from CUDA library
CUDA cannot be found and initialized (cuInit failed).
ERROR - [./src/topology_gpu.c:topology_gpu_init:232] Cannot get number of devices from CUDA library
Error init GPU Marker API.
--------------------------------------------------------------------------------
GPU Marker API result file does not exist. This may happen if the application has not called LIKWID_GPUMARKER_CLOSE.

Command:
likwid-perfctr -V 3 -G 0 -W FLOPS_SP -m julia --project=. perfctr_gpu.jl

Julia File:

# perfctr_gpu.jl
using LIKWID
using LinearAlgebra
using CUDA

@assert CUDA.functional()

const N = 10_000
const a = 3.141
# Note: CUDA defaults to Float32
const x = CUDA.rand(N)
const y = CUDA.rand(N)
const z = CUDA.zeros(N)

GPUMarker.init()

GPUMarker.startregion("saxpy")
for _ in 1:100
    z .= a .* x .* y
end
GPUMarker.stopregion("saxpy")

GPUMarker.close()

@carstenbauer carstenbauer added the bug Something isn't working label Jun 15, 2022
@carstenbauer
Copy link
Member Author

carstenbauer commented Jun 27, 2022

I get the same segfault (see first post) on an entirely different machine (our DGX-A100). To me, that seems to suggest that this is not system / installation related but actually a LIKWID bug. @TomTheBear what do you think?

@carstenbauer
Copy link
Member Author

I also don't understand the CUDA cannot be found and initialized (cuInit failed). ERROR - [./src/topology_gpu.c:topology_gpu_init:232] Cannot get number of devices from CUDA library error for the likwid-perfctr example above because this works just fine:

julia> using LIKWID

julia> LIKWID.get_gpu_topology()
LIKWID.GpuTopology
├ numDevices: 8
└ devices: ... (8 elements)

@carstenbauer
Copy link
Member Author

carstenbauer commented Jun 28, 2022

@TomTheBear With your PR the segfault is gone but, I believe, things still aren't working as they should.

  1. nvmon_getLastMetric and nvmon_getMetric aren't available (anymore?).
➜  bauerc@ln-0002 .local  nm -D lib/liblikwid.so | grep nvmon | grep Metric
00000000001e0560 T nvmon_getMetricName
00000000001e1b00 T nvmon_getMetricOfRegionGpu
00000000001e1810 T nvmon_getMetricsOfRegion
00000000001e0710 T nvmon_getNumberOfMetrics

2. Only considering raw events, I always get only zeros. Nevermind, it shows up under "FLOPS_DP" because a is Float64, see below. My bad.

@carstenbauer
Copy link
Member Author

carstenbauer commented Jun 28, 2022

Oh, regarding point 2, it shows up under "FLOPS_DP", but why?! Update: Because a is a Float64.... If we make it a Float32 it works:

julia> a = 3.141f0;

julia> events = @nvmon "FLOPS_SP" saxpy!(z, a, x, y);

Group: FLOPS_SP
┌────────────────────────────────────────────────────┬─────────┐
│                                              Event │   GPU 1 │
├────────────────────────────────────────────────────┼─────────┤
│ SMSP_SASS_THREAD_INST_EXECUTED_OP_FADD_PRED_ON_SUM │     0.0 │
│ SMSP_SASS_THREAD_INST_EXECUTED_OP_FMUL_PRED_ON_SUM │     0.0 │
│ SMSP_SASS_THREAD_INST_EXECUTED_OP_FFMA_PRED_ON_SUM │ 10000.0 │
└────────────────────────────────────────────────────┴─────────┘

@carstenbauer
Copy link
Member Author

carstenbauer commented Jun 29, 2022

With the latest fixes on the likwid PR this works again. Hope we'll get a new likwid release with this soon 😀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant