Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add nvidia-bug-report to eks-logs-collector #1864

Merged
merged 4 commits into from
Jun 26, 2024

Conversation

suket22
Copy link
Member

@suket22 suket22 commented Jun 24, 2024

Issue #, if available:
N/A

Description of changes:
This PR adds the execution of nvidia-bug-report.sh in the eks-logs-collector. This executable is part of the Nvidia drivers and is useful for debugging. Script is alsot mentioned in https://docs.nvidia.com/deploy/gpu-debug-guidelines/index.html

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Testing Done

I tested this script on a g4dn instance which has an Nvidia GPU, and verified that the log.gz file created by nvidia-bug-report.sh is included in the log collector archive.

Trying to Collect CPU Throttled Process Information...
Trying to Collect IO Throttled Process Information...
Trying to Collect Nvidia Bug report...
Trying to archive gathered information...

	Done... your bundled logs are located in /var/log/eks_i-...tar.gz

Also ran the script against a t3.large to make sure the script doesn't break -

Trying to Collect CPU Throttled Process Information...
Trying to Collect IO Throttled Process Information...
Trying to Collect Nvidia Bug report... No Nvidia drivers found, nothing to do.

Trying to archive gathered information...

	Done... your bundled logs are located in /var/log/eks_i-....tar.gz

See this guide for recommended testing for PRs. Some tests may not apply. Completing tests and providing additional validation steps are not required, but it is recommended and may reduce review time and time to merge.

@cartermckinnon cartermckinnon merged commit bfca904 into awslabs:main Jun 26, 2024
10 checks passed
mebays pushed a commit to mebays/amazon-eks-ami that referenced this pull request Jul 26, 2024
* Add nvidia-bug-report to eks-logs-collector

* Fixing linter error

* Fixing linter again oof

* Addressing comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants