Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(kepler): enable pprof #1383

Merged
merged 1 commit into from
Apr 29, 2024
Merged

Conversation

sthaha
Copy link
Collaborator

@sthaha sthaha commented Apr 25, 2024

verified by running go tool pprof http://localhost:8888/debug/pprof/heap.

Signed-off-by: Sunil Thaha <sthaha@redhat.com>
@sthaha sthaha requested a review from rootfs April 25, 2024 01:46
@sthaha sthaha mentioned this pull request Apr 25, 2024
@sthaha sthaha enabled auto-merge April 25, 2024 01:54
Copy link
Collaborator

@marceloamaral marceloamaral left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jtaleric
Copy link

jtaleric commented Apr 25, 2024

@sthaha Have you seen that we already have pprof support? https://github.com/sustainable-computing-io/kepler/blob/main/cmd/exporter/exporter.go#L137

Hey @marceloamaral - while we can get a pprof with this, it isn't ideal at all. I tried to enumerate this in a document, but I will add it here -

Kepler does not expose a pprof endpoint for collection, however, kepler does have a mechanism to capture a pprof at
 the start of the daemon. 

To enable --

Disable the controller 

$ oc edit deployment.apps/kepler-operator-controller
Set replicas to 0

Update the daemon set of kepler
$ oc edit daemonset.apps/kepler -n openshift-power-monitoring
   spec:
      containers:
      - command:
        - /usr/bin/kepler
        - -address
        - 0.0.0.0:9103
        - -enable-cgroup-id=true
        - -enable-gpu=$(ENABLE_GPU)
        - -v=$(KEPLER_LOG_LEVEL)
        - -kernel-source-dir=/usr/share/kepler/kernel_sources
        - -redfish-cred-file-path=/etc/redfish/redfish.csv
        - -cpuprofile=/profile/test


You will also need to create a emptyDir volume

     volumes:
      - emptyDir: {}
        name: profile


Then add the volumeMount

        volumeMounts:
        - mountPath: /profile
          name: profile


Once you apply this config, Kepler daemons will be deployed, however, it will only do a 60 second capture from the 
start up (you can update this to longer, however it is only at the start of the daemon). This means, you need to create 
the “load” and leave it, then restart the Kepler container.

Once you have your pprof, it will be in /profile. Since we do not have TAR in the Kepler pod, we need to copy the binary 
off a bit differently, see the below example

$kubectl exec -i kepler-fld8k -n openshift-power-monitoring -- cat /profile/test > profile

Once you have your collection, you can generate the raw file and build the flamegraph or whatever you like for 
visualization. 

@sthaha
Copy link
Collaborator Author

sthaha commented Apr 25, 2024

@marceloamaral same arguments as @jtaleric .
AFAIK, the golang standard way to expose pprof is by just importing _ .. /pprof and let the func Init() of pprof add http endpoints to default mux.
This allows for pprof to be captured when kepler is in a particular state of interest which may be hard to reproduce (this may not be the first 60 seconds from the start).
IMHO, we should remove the existing one (in a different PR) and use the standard way of getting profiling data.

https://pkg.go.dev/runtime/pprof#hdr-Profiling_a_Go_program says

There is also a standard HTTP interface to profiling data. Adding the following line will install handlers under the /debug/pprof/ URL to download live profiles:

import _ "net/http/pprof"

NOTE: I didn't remove the existing one since it may be added for a particular reason which I couldn't figure reading the git commits but that is a separate discussion which can be handled in a different PR.

@rootfs
Copy link
Contributor

rootfs commented Apr 25, 2024

thank @sthaha, that makes sense.

@marceloamaral would you take a look? Remote profiling is helpful for online debuggin.

@rootfs rootfs disabled auto-merge April 26, 2024 12:53
@rootfs rootfs merged commit f5910e2 into sustainable-computing-io:main Apr 29, 2024
20 checks passed
@marceloamaral
Copy link
Collaborator

Let's try to clean the code and have only one approach.
/lgtm

sthaha added a commit to sthaha/kepler that referenced this pull request Apr 30, 2024
The commit - f5910e2 adds pprof HTTP endpoints to kepler thus
removing the need to support -cpu-profile or -memory-profile explicitly.

See also: sustainable-computing-io#1383

Signed-off-by: Sunil Thaha <sthaha@redhat.com>
rootfs pushed a commit that referenced this pull request Apr 30, 2024
The commit - f5910e2 adds pprof HTTP endpoints to kepler thus
removing the need to support -cpu-profile or -memory-profile explicitly.

See also: #1383

Signed-off-by: Sunil Thaha <sthaha@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants