Don't use '-A' in network.py on non-numa systems (BugFix)#711
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #711 +/- ##
==========================================
- Coverage 36.58% 35.46% -1.12%
==========================================
Files 310 301 -9
Lines 34639 34254 -385
Branches 5965 5934 -31
==========================================
- Hits 12671 12149 -522
- Misses 21399 21557 +158
+ Partials 569 548 -21
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
bladernr
left a comment
There was a problem hiding this comment.
Looks good to me... thanks
Hook25
left a comment
There was a problem hiding this comment.
I have to ask a few changes here:
- codecov is complaining because we need a new test here,
sumarize_cpuis untested - it is kind of weird to return
-1instead to throwing an exception. Would you consider changing this to raise an exception both if the file is not found or if it is found and contains a-1? That way anyone calling the function will not have to check if-1is a valid numa number or what is going on, if something is returned, it is valid, else an exception is thrown - Minor: Change the title of the PR after rebasing on main, this is a bugfix so add a (bugfix) postfix to the title!
Ty
|
I added tests and fixed the PR title. @hook please take another look |
Hook25
left a comment
There was a problem hiding this comment.
+1, if this causes a problem we can change in the future I guess
3cda9ca to
7ec6662
Compare
Description
This PR fixes bug #570, which can cause the CPU load to spike if the CPU affinity is set incorrectly on systems that don't support NUMA features.
The fix involves identifying systems that lack NUMA support and not passing the -A {nodes} option to iperf3 on such systems.
Resolved issues
Resolves bug #570
Jira card: https://warthogs.atlassian.net/browse/SERVCERT-1183
Documentation
No documentation changes required.
Tests
Tested on lloobee (my own ARM64-based development system), which exhibited the bug. The new version drops the CPU load from 90%+ to about 35-40%, and increases the throughput by about 5% (from 88% to 93%, give or take a percent or two).
I've also tested on two servers in the Server Certification pool, torchtusk and whomp, to be sure the script doesn't create problems on systems that DO support NUMA. No unexpected problems arose. The change should result in exactly the same iperf3 command being used on such systems as the previous version of the script. Such systems were unaffected by the bug to begin with.