Ability to Dynamically Set CPU Affinities for Benchmarking #1917

piyush286 · 2020-07-25T00:37:25Z

• Added an affinity script that can dynamically set CPU affinities according to the platform and machine HW
	○ This script is well-documented. It lists basic usage info, sample commands, common and platform specific dependencies,
	○ If some machine doesn't have the affinity tool, this script gracefully handles it. We just output a warning and then continue without using affinity.
• Added another script that acts like a wrapper between playlist.xml and affinity script and redirects output of affinity script to /dev/null.
	○ It uses 2 physical CPUs with SMT as default, which might be the most common perf testing use case. Hence, we can avoid passing those in the playlist.xml file for every test. In case, we do want to run some test with different HW config, then we can specify them only for that test. Also, if you want to run all tests with different affinity than the default, you can easily change this file for one-off testing.
• Used affinity command for just one benchmark to begin with
• Added a check to avoid downloading the Renaissance benchmark binary if it already exists

Issues:
#1587
adoptium/TKG#34

Signed-off-by: Piyush Gupta piyush286@gmail.com

piyush286 · 2020-07-25T00:52:07Z

Using affinity just for one test for now to get started. If these changes look good, I can do the same for other tests in future PR and do proper perf evaluation with a bigger set of tests to show that we do get more reliable results (lower confidence interval) with affinity.

Baseline: no affinity
Test: affinity

Sanity Runs

ppc64_aix
https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/3657/console
Pass
Running EXEC_CMD_WITH_AFFINITY=execrset -c 0-7 -e /home/jenkins/workspace

x86-64_mac
https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/3660/console
Passed
Running EXEC_CMD_WITH_AFFINITY= /Users/jenkins

ppc64le_linux
https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/3661/console
Passed
Running EXEC_CMD_WITH_AFFINITY=numactl --physcpubind=0,1 --membind=0 /home/jenkins/workspace

After adding a warning for missing affinity tool

s390x_linux
https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/3667/console
Pass
Warning!!! Affinity is NOT set. Affinity tool may NOT be installed/supported.
Running EXEC_CMD_WITH_AFFINITY= /home/jenkins/workspace

s390x_zos
https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/3668/console
There are no nodes with the label ‘ci.role.test&&hw.arch.s390x&&sw.os.zos’

aarch64_linux
https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/3669/console
Pass
Warning!!! Affinity is NOT set. Affinity tool may NOT be installed/supported.
Running EXEC_CMD_WITH_AFFINITY= /home/jenkins/workspace

x86-64_linux
https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/3666/console
Pass
Warning!!! Affinity is NOT set. Affinity tool may NOT be installed/supported.
Running EXEC_CMD_WITH_AFFINITY= /home/jenkins/workspace

Disabled Windows

x86-64_windows
https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/3671/console
Waiting for next available executor

piyush286 · 2020-07-25T01:10:43Z

x86-64_windows Issue

I have to disable Windows temporarily since the test fails without giving any clues. I feel that we are not being able to run the bash script on Windows, which does seem to have cygwin. The output doesn't tell us much as shown below.

Here's a build with the changes in this PR except that Windows isn't disabled here.
https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/3659/console
Failed

14:12:58  C:/Users/jenkins.EC2AMAZ-LDA73k6/workspace/Grinder/openjdk-tests/\\perf\\run_with_affinity.sh --exec_cmd ""C:/Users/jenkins.EC2AMAZ-LDA73k6/workspace/Grinder/openjdkbinary/j2sdk-image\\bin\\java" -jar "C:/Users/jenkins.EC2AMAZ-LDA73k6/workspace/Grinder/openjdk-tests///..//jvmtest\\perf\\renaissance\\renaissance-mit.jar" --json ""C:/Users/jenkins.EC2AMAZ-LDA73k6/workspace/Grinder/openjdk-tests/\\TKG\\test_output_1595614375980\\renaissance-scala-kmeans_0"\\scala-kmeans.json" scala-kmeans" --test_root C:/Users/jenkins.EC2AMAZ-LDA73k6/workspace/Grinder/openjdk-tests/; \
14:12:58  	if [ $? -eq 0 ] ; then echo ""; echo "renaissance-scala-kmeans_0""_PASSED"; echo ""; cd C:/Users/jenkins.EC2AMAZ-LDA73k6/workspace/Grinder/openjdk-tests/;  else echo ""; echo "renaissance-scala-kmeans_0""_FAILED"; echo ""; fi; } 2>&1 | tee -a "C:/Users/jenkins.EC2AMAZ-LDA73k6/workspace/Grinder/openjdk-tests/\\TKG\\test_output_1595614375980\\TestTargetResult";
14:12:58  
14:12:58  renaissance-scala-kmeans_0_FAILED

I tried to look for examples in this repo to see whether any other test is running a shell script on Windows. I notice a couple of external tests (one example below) using bash script and they don't seem to have platformRequirements tag. Hence, are they supposed to work on Windows?

https://github.com/AdoptOpenJDK/openjdk-tests/blob/6056e150e7c1f8e6aa75cf87f24f2967a19685ac/external/quarkus/playlist.xml#L29-L41

I don't see any external test pipelines for Windows on Adopt or internal server. I can't launch a grinder either since neither internal or external Jenkins seems to have a Windows machine with docker.

x86-64_windows
DOCKER_REQUIRED=true
BUILD_LIST=external/quarkus
TARGET=quarkus_java_test
https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/3673/console
There are no nodes with the label ‘ci.role.test&&hw.arch.x86&&sw.os.windows&&sw.tool.docker’

piyush286 · 2020-07-25T10:49:28Z

Forgot to mention: Currently, we're running Renaissance sanity tests daily while extended tests weekly. If we look at the Tabular View, we can see extremely high confidence interval (i.e. fluctuation in numbers) since they aren't using CPU affinity, while we see quite stable numbers for Quarkus tests, which are using affinity as shown in the screenshot below.

High CI for Renaissance: Major Regression or Improvements

Low CI for Renaissance: No Major Regression or Improvements

piyush286 · 2020-07-25T10:59:44Z

Documentation:

Overview of affinity script: https://github.com/piyush286/openjdk-tests/blob/3db1e3463952cc731fee503673350636d3374f5a/perf/affinity.sh#L18-L93
Commands generated for each platform: https://github.com/piyush286/openjdk-tests/blob/3db1e3463952cc731fee503673350636d3374f5a/perf/affinity.sh#L1425-L1445

@smlambert Requesting review from you! I've just used affinity for one test for now because I wanted to check with you to see whether that's the right approach and you're okay with it. Whatever you suggest, we can do that for all the tests.

Also, any suggestions regarding the windows issue?

Also, I think it would be good to get those affinity tool dependencies added to perf machines if we want to improve our perf testing. Currently, the affinity script can gracefully avoid using affinity commands if the machine doesn't have the dependencies as shown here.

Thanks!

smlambert

Thanks @piyush286 - wondering if we can temporarily create a second target in the playlist called renaissance-scala-kmeans-windows where we continue to run as before (no run_with_affinity.sh), it serves as a functional test... until the windows issue can be resolved. New target can have a  that points to the windows issue raised.

• Added an affinity script that can dynamically set CPU affinities according to the platform and machine HW ○ This script is well-documented. It lists basic usage info, sample commands, common and platform specific dependencies, ○ If some machine doesn't have the affinity tool, this script gracefully handles it. We just output a warning and then continue without using affinity. • Added another script that acts like a wrapper between playlist.xml and affinity script and redirects output of affinity script to /dev/null. ○ It uses 2 physical CPUs with SMT as default, which might be the most common perf testing use case. Hence, we can avoid passing those in the playlist.xml file for every test. In case, we do want to run some test with different HW config, then we can specify them only for that test. Also, if you want to run all tests with different affinity than the default, you can easily change this file for one-off testing. • Used affinity command for just one benchmark to begin with • Added a check to avoid downloading the Renaissance benchmark binary if it already exists Issues: adoptium#1587 adoptium/TKG#34 Signed-off-by: Piyush Gupta <piyush286@gmail.com>

piyush286 · 2020-07-30T22:21:50Z

Thanks @smlambert for the review. Updated the changes as suggested.

New Testing:

x86-64_windows
renaissance-scala-kmeans-windows
https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/3677/console
./openjdk-tests/get.sh: line 574: /home/jenkins/workspace/Grinder/openjdkbinary/j2sdk-image/bin/java: No such file or directory
=> Don't think it's related to this change. Launched another one below.

https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/3680/console
Waiting for next available executor on ‘ci.role.test&&hw.arch.x86&&sw.os.windows

x86-64_linux
sanity.perf
https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/3678/console
renaissance-scala-kmeans_0_PASSED
renaissance-scala-kmeans-windows_0_SKIPPED

smlambert

Thanks @piyush286 - I'll merge once the Grinders finish running (have a great vacation)!

smlambert · 2020-07-31T02:40:01Z

the relaunch on windows looks good https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/3681/ so will merge

karianna added the enhancement label Jul 27, 2020

karianna added this to the July 2020 milestone Jul 27, 2020

smlambert self-requested a review July 28, 2020 20:35

smlambert reviewed Jul 29, 2020

View reviewed changes

piyush286 mentioned this pull request Jul 30, 2020

Unable to Run Bash Script from Playlist.xml on Windows #1928

Closed

piyush286 force-pushed the affinity branch from 3db1e34 to 5a0e600 Compare July 30, 2020 20:10

smlambert approved these changes Jul 31, 2020

View reviewed changes

smlambert merged commit b75025e into adoptium:master Jul 31, 2020

This was referenced Oct 17, 2020

Get Relevant Machine Info for Setting CPU Affinities adoptium/TKG#34

Closed

Ability to Set CPU Affinities for Benchmarking #1587

Closed

sophia-guo mentioned this pull request Jul 19, 2021

Pre-release triage of weekly test runs #2754

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ability to Dynamically Set CPU Affinities for Benchmarking #1917

Ability to Dynamically Set CPU Affinities for Benchmarking #1917

piyush286 commented Jul 25, 2020

piyush286 commented Jul 25, 2020 •

edited

Loading

piyush286 commented Jul 25, 2020

piyush286 commented Jul 25, 2020

piyush286 commented Jul 25, 2020 •

edited

Loading

smlambert left a comment

piyush286 commented Jul 30, 2020

smlambert left a comment

smlambert commented Jul 31, 2020

Ability to Dynamically Set CPU Affinities for Benchmarking #1917

Ability to Dynamically Set CPU Affinities for Benchmarking #1917

Conversation

piyush286 commented Jul 25, 2020

piyush286 commented Jul 25, 2020 • edited Loading

Sanity Runs

piyush286 commented Jul 25, 2020

x86-64_windows Issue

piyush286 commented Jul 25, 2020

piyush286 commented Jul 25, 2020 • edited Loading

Documentation:

smlambert left a comment

Choose a reason for hiding this comment

piyush286 commented Jul 30, 2020

New Testing:

smlambert left a comment

Choose a reason for hiding this comment

smlambert commented Jul 31, 2020

piyush286 commented Jul 25, 2020 •

edited

Loading

piyush286 commented Jul 25, 2020 •

edited

Loading