Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Big overhaul #31

Merged
merged 35 commits into from
Jun 29, 2022
Merged

[WIP] Big overhaul #31

merged 35 commits into from
Jun 29, 2022

Conversation

carstenbauer
Copy link
Member

@carstenbauer carstenbauer commented May 14, 2022

This is a WIP PR to improve the user-facing interface (and documetation) of, primarily, LIKWID.PerfMon.

Code

  • one-based indexing
  • provide simplified interface
    • perfmon
      • multiple perf groups
    • @perfmon
      • keyword args for autopin etc.?
      • multiple perf groups
  • CI
    • check / overhaul / fix examples
      • CPU
      • GPU (and CPU+GPU)
    • setup CI on N2

Docs

  • clarify examples wrt indices (not cpu ids, one-based)
  • multiple group example
  • multithreading example

TODO before merge

  • tests for @perfmon_marker / perfmon_marker
  • docs
    • howto perfmon
  • @nvmon / nvmon
    • tests
  • @nvmon_marker / nvmon_marker
  • Update example in README.md

@carstenbauer
Copy link
Member Author

carstenbauer commented May 21, 2022

Preview of the new simplified API

Simple single-threaded example:

using LIKWID
using LinearAlgebra

const N = 10_000
const a = 3.141
const x = rand(N)
const y = rand(N)
const z = zeros(N)

metrics, events = @perfmon "FLOPS_DP" begin
    for i in eachindex(x, y)
        z[i] = a * x[i] * y[i]
    end
end

@show events["RETIRED_SSE_AVX_FLOPS_ALL"];
@show metrics["DP [MFLOP/s]"];

Output

events["RETIRED_SSE_AVX_FLOPS_ALL"] = 20000.0                                                                       
metrics["DP [MFLOP/s]"] = 1363.3126039898723

Simple multi-threaded example:

using LIKWID
using LinearAlgebra

const N = 10_000
const a = 3.141
const x = rand(N)
const y = rand(N)
const z = zeros(N)

metrics, events = @perfmon "FLOPS_DP" begin
    Threads.@threads for i in eachindex(x, y)
        z[i] = a * x[i] * y[i]
    end
end

@show getindex.(events, "RETIRED_SSE_AVX_FLOPS_ALL"); # vector of events corresponding to Julia threads
@show getindex.(metrics, "DP [MFLOP/s]");

Output:

getindex.(events, "RETIRED_SSE_AVX_FLOPS_ALL") = [5954.0, 5000.0, 5000.0, 5000.0]
getindex.(metrics, "DP [MFLOP/s]") = [0.3990529460355565, 0.3351133238457814, 0.3351133238457814, 0.3351133238457814]

Note that Julia threads automatically get pinned to (the cpu threads they are currently running on) to get reliable results. The function equivalent perform(f, groupname; cpuids, autopin) provides more options.

(cc @antoine-levitt, @TomTheBear, @vchuravy)

@antoine-levitt
Copy link

That's very nice, thanks for doing this, will test once merged!

@TomTheBear
Copy link
Collaborator

There is a driver mismatch on the one CI node:

xxxxx@medusa ~ $ nvidia-smi 
Failed to initialize NVML: Driver/library version mismatch

This needs to be fixed by our admins.

@carstenbauer
Copy link
Member Author

Thanks for the info. I'll also set up CI on Noctua 2 soon so that we test on two different systems, one with the daemon and one with the perf_events backend.

@carstenbauer carstenbauer changed the title [WIP] PerfMon improvements + switching to one-based indexing for indices [WIP] PerfMon improvements + switching to one-based indexing for indices + CI on N2 May 23, 2022
perfmon / @perfmon (cont'd)
separate gitlab ci yml files for FAU and PC2

try avoid multiple CI runs on N2

ci n2

fix typo

drop N2 CI on tags

n2 ci test

n2 ci again

n2 ci again

ci

ci again..

cici

cicici

ci (almost done)

ci (almost done2)
@carstenbauer
Copy link
Member Author

carstenbauer commented May 25, 2022

@carstenbauer carstenbauer changed the title [WIP] PerfMon improvements + switching to one-based indexing for indices + CI on N2 [WIP] Big overhaul Jun 15, 2022
@vchuravy
Copy link
Member

This is looking great!

@carstenbauer
Copy link
Member Author

Will try to finish things next week before I record my JuliaCon talk 😊

@carstenbauer
Copy link
Member Author

Let's merge this (I'll fix CI / doc build on main because it's easier). Remaining things like @nvmon_marker will have to wait.

@carstenbauer carstenbauer merged commit 20a454c into main Jun 29, 2022
@carstenbauer carstenbauer deleted the cb/perfmonrev branch June 29, 2022 14:42
@carstenbauer
Copy link
Member Author

@antoine-levitt Feel free to try out the main branch now. Would be great to have someone test it :)

@antoine-levitt
Copy link

Hm, I wanted to try this for a course I was giving but it's over now and I don't want to go back to it :-p but I'll definitely have a use for it at some point, it looks great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants