Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: Bug #443 -- better optimize network.py high-speed network #479

Merged
merged 1 commit into from
May 30, 2023

Conversation

rodwsmith
Copy link
Collaborator

@rodwsmith rodwsmith commented May 19, 2023

Fix: Bug #443 -- better optimize network.py high-speed network performance.

Description

This pull request fixes sub-optimal network performance in the network.py network speed tests on high-speed (>10 Gbps) network devices. It makes three changes:

  • When the script automatically runs more than two threads, it runs a quick (1-minute) optimization test with 0.5x, 1x, 1.5x, and 2x the computed number of threads to find the optimal number of threads. This is not done when the number of threads is specified on the command line.
  • The script limits the maximum per-thread throughput (via the iperf3 -b option) to 1 Mbps over the theoretical optimal amount, based on the interface speed and number of threads.
  • The script uses the -A parameter to iperf3 to set the CPU affinity. This requires computing the affinity, using code lifted from the start-iperf3 script in the maas-cert-server package.

The patch has no effect on slow (10 Mbps and slower) network devices, but on faster ones, the thread optimization tests add about four minutes to the run time, and add some output (see below).

Resolved issues

Fixes: #443

Documentation

Neither Checkbox nor the network.py script has changes to their calling conventions or configuration; however, output is expanded a bit, as in this example:

sudo ./network.py test -i ens1 -t iperf --iperf3 --scan-timeout 36 --fail-threshold 80 --cpu-load-fail-threshold 90 --runtime 30 --num_runs 1 --target 10.1.16.15
INFO:root:Testing ens1 against 10.1.16.15
INFO:root:Have successfully pinged 10.1.16.15 on ens1
INFO:root:--------------- Optimizing Number of Threads ---------------
INFO:root:Testing optimization with 5 threads
INFO:root:Found throughput of 39526 with 5 threads
INFO:root:Testing optimization with 10 threads
INFO:root:Found throughput of 85962 with 10 threads
INFO:root:Testing optimization with 15 threads
INFO:root:Found throughput of 75820 with 15 threads
INFO:root:Testing optimization with 20 threads
INFO:root:Found throughput of 98198 with 20 threads
INFO:root:Setting number of threads to 20.
INFO:root:-------------------- Test Run Number 1 --------------------
INFO:root:Using 20 threads.
INFO:root:NUMA node of ens1 is 0....
INFO:root:Connecting to port 5201 on server....
INFO:root:Connecting to port 5202 on server....
INFO:root:Connecting to port 5203 on server....
INFO:root:Connecting to port 5204 on server....
INFO:root:Connecting to port 5205 on server....
INFO:root:Connecting to port 5206 on server....
INFO:root:Connecting to port 5207 on server....
INFO:root:Connecting to port 5208 on server....
INFO:root:Connecting to port 5209 on server....
INFO:root:Connecting to port 5210 on server....
INFO:root:Connecting to port 5211 on server....
INFO:root:Connecting to port 5212 on server....
INFO:root:Connecting to port 5213 on server....
INFO:root:Connecting to port 5214 on server....
INFO:root:Connecting to port 5215 on server....
INFO:root:Connecting to port 5216 on server....
INFO:root:Connecting to port 5217 on server....
INFO:root:Connecting to port 5218 on server....
INFO:root:Connecting to port 5219 on server....
INFO:root:Connecting to port 5220 on server....
INFO:root:Avg Transfer speed: 89510.53333333335 Mb/s
INFO:root:89.51% of theoretical max 100000 Mb/s
INFO:root:Average CPU utilization: 16.0%
INFO:root:

The "Optimizing Number of Threads" section is new, and summarizes the optimization testing. This section is omitted on slow (10 Gbps and slower) tests.

To take advantage of these changes, the iperf3 server may need to run more threads. This will require changes to the Server Certification documentation (the Self-Test Guide).

Tests

Testing was done by manually running the test using several servers in Needham. Improvements in performance of about 5-10% were observed after these changes were implemented.

filename = "/sys/class/net/" + device + "/device/numa_node"
try:
file = open(filename, "r")
node_num = int(file.read())
Copy link
Collaborator

@bladernr bladernr May 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be a bit better using with:

try:
  with open(filename, "r") as file:
    node_num = int(file.read())
except FileNotFoundError

so that the file is closed once it's been read.

Copy link
Collaborator

@bladernr bladernr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor point about using with when opening filename so that the file is properly closed, otherwise LGTM. I think this is going to be a bit noisy (I haven't run it yet to see) so we likely will want to revisit this next cycle and clean up the output a bit in general.

@rodwsmith
Copy link
Collaborator Author

I've made Jeff's suggested changes and squashed the results.

Also, I've changed the description (shown near the top of this page, at least to me). I accidentally left the whole thing commented out on the original submission; I've fixed that mistake. Note that the description includes a sample output from running the revised script. The PR includes code to disable logging when doing the test runs, but adds summaries for each test run. This minimizes the added clutter, but there IS some extra output, to help in case there are problems when this type of output might be helpful.

Finally, I've also created a PR for minor changes to the STG so that users know to launch more instances of iperf3 on the Target system:

https://github.com/canonical/certification-docs/pull/17

@bladernr bladernr self-requested a review May 24, 2023 18:08
Copy link
Collaborator

@bladernr bladernr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, Thanks!

@pieqq pieqq merged commit 11c047b into main May 30, 2023
@pieqq pieqq deleted the improve-iperf3-high-speed branch May 30, 2023 06:15
@baconYao
Copy link
Contributor

baconYao commented Jun 7, 2023

Hi @pieqq and @rodwsmith

Test environment:

  • Device: Raspberry PI 4
  • OS: 22.04 Desktop image (I downloaded it from the Ubuntu Official Website)
  • Snap:
     u@u-desktop:~$ snap list
     Name                       Version           Rev    Tracking         Publisher            Notes
     checkbox                   2.6               2418   latest/stable    ce-certification-qa  classic
     checkbox22                 2.6               372    latest/stable    ce-certification-qa  -
    

I ran into a problem which gives the error like below

-----------------------[ Iperf3 stress testing for eth0 ]-----------------------
ID: com.canonical.certification::ethernet/iperf3_eth0
Category: Ethernet Device tests
--------------------------------------------------------------------------------
INFO:root:Testing eth0 against 10.102.88.220
INFO:root:Have successfully pinged 10.102.88.220 on eth0
INFO:root:-------------------- Test Run Number 1 --------------------
INFO:root:Using 1 thread.
WARNING:root:WARNING: Could not find the NUMA node associated with eth0!
WARNING:root:Setting the association to NUMA node 0, which may not be optimal!
INFO:root:NUMA node of eth0 is 0....
Traceback (most recent call last):
  File "/tmp/nest-pmj_1dlm.733188759a536d454588eb1b832197fe0ac6141c07b5ee7f5b36d7837164df28/network.py", line 1023, in <module>
    sys.exit(main())
  File "/tmp/nest-pmj_1dlm.733188759a536d454588eb1b832197fe0ac6141c07b5ee7f5b36d7837164df28/network.py", line 1019, in main
    return args.func(args)
  File "/tmp/nest-pmj_1dlm.733188759a536d454588eb1b832197fe0ac6141c07b5ee7f5b36d7837164df28/network.py", line 783, in interface_test
    error_number = run_test(args, test_target)
  File "/tmp/nest-pmj_1dlm.733188759a536d454588eb1b832197fe0ac6141c07b5ee7f5b36d7837164df28/network.py", line 614, in run_test
    error_number = iperf_benchmark.run()
  File "/tmp/nest-pmj_1dlm.733188759a536d454588eb1b832197fe0ac6141c07b5ee7f5b36d7837164df28/network.py", line 292, in run
    core = core_list[thread_num % len(core_list)]
ZeroDivisionError: integer division or modulo by zero
--------------------------------------------------------------------------------
Outcome: job failed

What happened?

According this PR, it tries to find the number of NUMA node via lscpu command, however, there is no NUMA node on RPI 4. This result causes the problem above.

u@u-desktop:/sys/class/net/eth0/device$ lscpu
Architecture:            aarch64
  CPU op-mode(s):        32-bit, 64-bit
  Byte Order:            Little Endian
CPU(s):                  4
  On-line CPU(s) list:   0-3
Vendor ID:               ARM
  Model name:            Cortex-A72
    Model:               3
    Thread(s) per core:  1
    Core(s) per cluster: 4
    Socket(s):           -
    Cluster(s):          1
    Stepping:            r0p3
    CPU max MHz:         1800.0000
    CPU min MHz:         600.0000
    BogoMIPS:            108.00
    Flags:               fp asimd evtstrm crc32 cpuid
Caches (sum of all):     
  L1d:                   128 KiB (4 instances)
  L1i:                   192 KiB (4 instances)
  L2:                    1 MiB (1 instance)
Vulnerabilities:         
  Itlb multihit:         Not affected
  L1tf:                  Not affected
  Mds:                   Not affected
  Meltdown:              Not affected
  Mmio stale data:       Not affected
  Retbleed:              Not affected
  Spec store bypass:     Vulnerable
  Spectre v1:            Mitigation; __user pointer sanitization
  Spectre v2:            Vulnerable
  Srbds:                 Not affected
  Tsx async abort:       Not affected
u@u-desktop:/sys/class/net/eth0/device$ ls -al
total 0
drwxr-xr-x  5 root root    0  一   1  1970 .
drwxr-xr-x 10 root root    0  一   1  1970 ..
lrwxrwxrwx  1 root root    0  一   1  1970 driver -> ../../../../bus/platform/drivers/bcmgenet
-rw-r--r--  1 root root 4096  六   7 09:45 driver_override
-r--r--r--  1 root root 4096  六   7 09:45 modalias
drwxr-xr-x  3 root root    0  一   1  1970 net
lrwxrwxrwx  1 root root    0  六   7 09:45 of_node -> ../../../../firmware/devicetree/base/scb/ethernet@7d580000
drwxr-xr-x  2 root root    0  六   7 09:45 power
lrwxrwxrwx  1 root root    0  一   1  1970 subsystem -> ../../../../bus/platform
-rw-r--r--  1 root root 4096  一   1  1970 uevent
drwxr-xr-x  4 root root    0  一   1  1970 unimac-mdio.-19

@rodwsmith
Copy link
Collaborator Author

@baconYao , could you please file a bug report? It looks like this should be easy enough to fix, but it would be helpful to have a bug report to reference in the PR. I don't have an RPi 4, but I do have an earlier version, so I can try testing on it.

@baconYao
Copy link
Contributor

baconYao commented Jun 8, 2023

@baconYao , could you please file a bug report? It looks like this should be easy enough to fix, but it would be helpful to have a bug report to reference in the PR. I don't have an RPi 4, but I do have an earlier version, so I can try testing on it.

Hi @rodwsmith, I file an issue #539.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

network.py is incompletely optimized for performance
4 participants