-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't use '-A' in network.py on non-numa systems (BugFix) #711
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #711 +/- ##
==========================================
- Coverage 36.58% 35.46% -1.12%
==========================================
Files 310 301 -9
Lines 34639 34254 -385
Branches 5965 5934 -31
==========================================
- Hits 12671 12149 -522
- Misses 21399 21557 +158
+ Partials 569 548 -21
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me... thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to ask a few changes here:
- codecov is complaining because we need a new test here,
sumarize_cpu
is untested - it is kind of weird to return
-1
instead to throwing an exception. Would you consider changing this to raise an exception both if the file is not found or if it is found and contains a-1
? That way anyone calling the function will not have to check if-1
is a valid numa number or what is going on, if something is returned, it is valid, else an exception is thrown - Minor: Change the title of the PR after rebasing on main, this is a bugfix so add a (bugfix) postfix to the title!
Ty
I added tests and fixed the PR title. @hook please take another look |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, if this causes a problem we can change in the future I guess
3cda9ca
to
7ec6662
Compare
Description
This PR fixes bug #570, which can cause the CPU load to spike if the CPU affinity is set incorrectly on systems that don't support NUMA features.
The fix involves identifying systems that lack NUMA support and not passing the -A {nodes} option to iperf3 on such systems.
Resolved issues
Resolves bug #570
Jira card: https://warthogs.atlassian.net/browse/SERVCERT-1183
Documentation
No documentation changes required.
Tests
Tested on lloobee (my own ARM64-based development system), which exhibited the bug. The new version drops the CPU load from 90%+ to about 35-40%, and increases the throughput by about 5% (from 88% to 93%, give or take a percent or two).
I've also tested on two servers in the Server Certification pool, torchtusk and whomp, to be sure the script doesn't create problems on systems that DO support NUMA. No unexpected problems arose. The change should result in exactly the same iperf3 command being used on such systems as the previous version of the script. Such systems were unaffected by the bug to begin with.