-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gpu docs update #1156
base: main
Are you sure you want to change the base?
gpu docs update #1156
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First sentence isn't exactly true. Second one belongs on the Efficiency page.
Thanks @mmurph3 for starting this! This is an effort to address some confusion raised by users in SUP-6416. Agree with Chip here that the first bullet point isn't necessarily true. It's probably safe to remove it. The second bullet point is good. How about we modify it slightly to the following? ## Troubleshooting
### Kubecost dashboards not showing GPU Efficiency or GPU Savings
In order for Kubecost to begin displaying GPU features, it must first detect that **at least one** of your clusters has a nonzero amount of GPU usage. Please validate that DCGM-Exporter is running in the clusters which have GPUs and that Kubecost is scraping nonzero GPU metrics from the exporter. It may be good to add this "Troubleshooting" section to this doc here, as well as the Efficiency doc we have. https://docs.kubecost.com/using-kubecost/navigating-the-kubecost-ui/efficiency-dashboard |
If we're going to create a public Troubleshooting section specific to GPU, we may want to take this opportunity to build it out more completely à la what I have put together here (internal resource). |
Made some changes before seeing the most recent comments. If we don't agree with what I wrote, I'm ok with changing/moving it. I agree, a built out troubleshooting doc would be good. @chipzoller , for some reason I'm getting 403'd on that internal link you gave: https://app.gitbook.com/o/MQuX6uFwV0j7vIHtR15E/s/xLM07kCOoiNtRubOhU77/customer-nvidia-gpu-troubleshooting#no-gpu-column-in-efficiency-page I can see about getting access through Cliff. |
@mmurph3 These are good changes, but are you sure it's enough? I'm concerned these small add-ons may be missed by some users. Hard to catch the sentence in a long document. What do you think about adding a "Troubleshooting" section to both these docs, and filling in a bit more details about what to do in the event that they are not seeing GPU Efficiency Features? |
If you guys don't mind, I'd like to take this over and propose some changes here. It just will have to be next week. |
@chipzoller Good with me! |
Related Issue
Proposed Changes