Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU MIG Right sizing recommendations by kruize #1312

Closed
bharathappali opened this issue Oct 3, 2024 · 8 comments · Fixed by #1314, #1318, #1320 or #1324
Closed

GPU MIG Right sizing recommendations by kruize #1312

bharathappali opened this issue Oct 3, 2024 · 8 comments · Fixed by #1314, #1318, #1320 or #1324
Assignees
Labels
enhancement New feature or request

Comments

@bharathappali
Copy link
Member

Describe the feature

Kruize reads CPU & memory usage data from the provided data source and comes up with the CPU and Memory right sizing recommendation. In a similar way it would be good to have the GPU MIG partition sizing recommendation for container which utilise GPU's

Examples or references

Most of the ML workloads need GPU power and advanced GPU's from NVIDIA support MIG (Multi instance GPU's) where a single Physical GPU can be partitioned into multi instances of virtual or logical GPU's which can be configured and shared across multiple containers. Ampere (from A30) and Hopper series GPU's provide this feature.

Suggest a solution

  • Record the GPU related metrics
  • Process the metrics along with CPU and Memory metrics
  • Provide MIG partition recommendation

Additional Context

None

@bharathappali bharathappali added the enhancement New feature or request label Oct 3, 2024
@bharathappali bharathappali self-assigned this Oct 3, 2024
@bharathappali bharathappali added this to the Kruize 0.0.26 Release milestone Oct 3, 2024
@bharathappali
Copy link
Member Author

bharathappali commented Oct 3, 2024

This new feature can be implemented in the following steps:

@dinogun
Copy link
Contributor

dinogun commented Oct 15, 2024

@bharathappali Can this be closed now?

@bharathappali
Copy link
Member Author

Yes @dinogun as all PR's are merged

@dinogun
Copy link
Contributor

dinogun commented Oct 15, 2024

Please update the test PR details and close this

@dinogun
Copy link
Contributor

dinogun commented Oct 15, 2024

@bharathappali I asked for the test PR details to be added in the description

@bharathappali
Copy link
Member Author

Sorry for over look, Will be adding it now @dinogun

@bharathappali bharathappali reopened this Oct 15, 2024
@bharathappali
Copy link
Member Author

Reopened to updated the description with Test PR

@bharathappali
Copy link
Member Author

Closing this issue as all the PR's are merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment