Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to Dynamically Set CPU Affinities for Benchmarking #1917

Merged
merged 1 commit into from
Jul 31, 2020

Conversation

piyush286
Copy link
Contributor

• Added an affinity script that can dynamically set CPU affinities according to the platform and machine HW
	○ This script is well-documented. It lists basic usage info, sample commands, common and platform specific dependencies,
	○ If some machine doesn't have the affinity tool, this script gracefully handles it. We just output a warning and then continue without using affinity.
• Added another script that acts like a wrapper between playlist.xml and affinity script and redirects output of affinity script to /dev/null.
	○ It uses 2 physical CPUs with SMT as default, which might be the most common perf testing use case. Hence, we can avoid passing those in the playlist.xml file for every test. In case, we do want to run some test with different HW config, then we can specify them only for that test. Also, if you want to run all tests with different affinity than the default, you can easily change this file for one-off testing.
• Used affinity command for just one benchmark to begin with
• Added a check to avoid downloading the Renaissance benchmark binary if it already exists

Issues:
#1587
adoptium/TKG#34

Signed-off-by: Piyush Gupta piyush286@gmail.com

@piyush286
Copy link
Contributor Author

piyush286 commented Jul 25, 2020

Using affinity just for one test for now to get started. If these changes look good, I can do the same for other tests in future PR and do proper perf evaluation with a bigger set of tests to show that we do get more reliable results (lower confidence interval) with affinity.

Baseline: no affinity
Test: affinity
image

Sanity Runs

ppc64_aix
https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/3657/console
Pass
Running EXEC_CMD_WITH_AFFINITY=execrset -c 0-7 -e /home/jenkins/workspace

x86-64_mac
https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/3660/console
Passed
Running EXEC_CMD_WITH_AFFINITY= /Users/jenkins

ppc64le_linux
https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/3661/console
Passed
Running EXEC_CMD_WITH_AFFINITY=numactl --physcpubind=0,1 --membind=0 /home/jenkins/workspace

After adding a warning for missing affinity tool

s390x_linux
https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/3667/console
Pass
Warning!!! Affinity is NOT set. Affinity tool may NOT be installed/supported.
Running EXEC_CMD_WITH_AFFINITY= /home/jenkins/workspace

s390x_zos
https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/3668/console
There are no nodes with the label ‘ci.role.test&&hw.arch.s390x&&sw.os.zos’

aarch64_linux
https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/3669/console
Pass
Warning!!! Affinity is NOT set. Affinity tool may NOT be installed/supported.
Running EXEC_CMD_WITH_AFFINITY= /home/jenkins/workspace

x86-64_linux
https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/3666/console
Pass
Warning!!! Affinity is NOT set. Affinity tool may NOT be installed/supported.
Running EXEC_CMD_WITH_AFFINITY= /home/jenkins/workspace

Disabled Windows

x86-64_windows
https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/3671/console
Waiting for next available executor

@piyush286
Copy link
Contributor Author

x86-64_windows Issue

I have to disable Windows temporarily since the test fails without giving any clues. I feel that we are not being able to run the bash script on Windows, which does seem to have cygwin. The output doesn't tell us much as shown below.

Here's a build with the changes in this PR except that Windows isn't disabled here.
https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/3659/console
Failed

14:12:58  C:/Users/jenkins.EC2AMAZ-LDA73k6/workspace/Grinder/openjdk-tests/\\perf\\run_with_affinity.sh --exec_cmd ""C:/Users/jenkins.EC2AMAZ-LDA73k6/workspace/Grinder/openjdkbinary/j2sdk-image\\bin\\java" -jar "C:/Users/jenkins.EC2AMAZ-LDA73k6/workspace/Grinder/openjdk-tests///..//jvmtest\\perf\\renaissance\\renaissance-mit.jar" --json ""C:/Users/jenkins.EC2AMAZ-LDA73k6/workspace/Grinder/openjdk-tests/\\TKG\\test_output_1595614375980\\renaissance-scala-kmeans_0"\\scala-kmeans.json" scala-kmeans" --test_root C:/Users/jenkins.EC2AMAZ-LDA73k6/workspace/Grinder/openjdk-tests/; \
14:12:58  	if [ $? -eq 0 ] ; then echo ""; echo "renaissance-scala-kmeans_0""_PASSED"; echo ""; cd C:/Users/jenkins.EC2AMAZ-LDA73k6/workspace/Grinder/openjdk-tests/;  else echo ""; echo "renaissance-scala-kmeans_0""_FAILED"; echo ""; fi; } 2>&1 | tee -a "C:/Users/jenkins.EC2AMAZ-LDA73k6/workspace/Grinder/openjdk-tests/\\TKG\\test_output_1595614375980\\TestTargetResult";
14:12:58  
14:12:58  renaissance-scala-kmeans_0_FAILED

I tried to look for examples in this repo to see whether any other test is running a shell script on Windows. I notice a couple of external tests (one example below) using bash script and they don't seem to have platformRequirements tag. Hence, are they supposed to work on Windows?

https://github.com/AdoptOpenJDK/openjdk-tests/blob/6056e150e7c1f8e6aa75cf87f24f2967a19685ac/external/quarkus/playlist.xml#L29-L41

I don't see any external test pipelines for Windows on Adopt or internal server. I can't launch a grinder either since neither internal or external Jenkins seems to have a Windows machine with docker.

x86-64_windows
DOCKER_REQUIRED=true
BUILD_LIST=external/quarkus
TARGET=quarkus_java_test
https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/3673/console
There are no nodes with the label ‘ci.role.test&&hw.arch.x86&&sw.os.windows&&sw.tool.docker’

@piyush286
Copy link
Contributor Author

Forgot to mention: Currently, we're running Renaissance sanity tests daily while extended tests weekly. If we look at the Tabular View, we can see extremely high confidence interval (i.e. fluctuation in numbers) since they aren't using CPU affinity, while we see quite stable numbers for Quarkus tests, which are using affinity as shown in the screenshot below.

High CI for Renaissance: Major Regression or Improvements
image

image

Low CI for Renaissance: No Major Regression or Improvements
image

@piyush286
Copy link
Contributor Author

piyush286 commented Jul 25, 2020

Documentation:

Overview of affinity script: https://github.com/piyush286/openjdk-tests/blob/3db1e3463952cc731fee503673350636d3374f5a/perf/affinity.sh#L18-L93
Commands generated for each platform: https://github.com/piyush286/openjdk-tests/blob/3db1e3463952cc731fee503673350636d3374f5a/perf/affinity.sh#L1425-L1445

@smlambert Requesting review from you! I've just used affinity for one test for now because I wanted to check with you to see whether that's the right approach and you're okay with it. Whatever you suggest, we can do that for all the tests.

Also, any suggestions regarding the windows issue?

Also, I think it would be good to get those affinity tool dependencies added to perf machines if we want to improve our perf testing. Currently, the affinity script can gracefully avoid using affinity commands if the machine doesn't have the dependencies as shown here.

Thanks!

@karianna karianna added this to the July 2020 milestone Jul 27, 2020
@smlambert smlambert self-requested a review July 28, 2020 20:35
Copy link
Contributor

@smlambert smlambert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @piyush286 - wondering if we can temporarily create a second target in the playlist called renaissance-scala-kmeans-windows where we continue to run as before (no run_with_affinity.sh), it serves as a functional test... until the windows issue can be resolved. New target can have a <!-- comment --> that points to the windows issue raised.

	• Added an affinity script that can dynamically set CPU affinities according to the platform and machine HW
		○ This script is well-documented. It lists basic usage info, sample commands, common and platform specific dependencies,
		○ If some machine doesn't have the affinity tool, this script gracefully handles it. We just output a warning and then continue without using affinity.
	• Added another script that acts like a wrapper between playlist.xml and affinity script and redirects output of affinity script to /dev/null.
		○ It uses 2 physical CPUs with SMT as default, which might be the most common perf testing use case. Hence, we can avoid passing those in the playlist.xml file for every test. In case, we do want to run some test with different HW config, then we can specify them only for that test. Also, if you want to run all tests with different affinity than the default, you can easily change this file for one-off testing.
	• Used affinity command for just one benchmark to begin with
	• Added a check to avoid downloading the Renaissance benchmark binary if it already exists

Issues:
adoptium#1587
adoptium/TKG#34

Signed-off-by: Piyush Gupta <piyush286@gmail.com>
@piyush286
Copy link
Contributor Author

Thanks @smlambert for the review. Updated the changes as suggested.

New Testing:

x86-64_windows
renaissance-scala-kmeans-windows
https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/3677/console
./openjdk-tests/get.sh: line 574: /home/jenkins/workspace/Grinder/openjdkbinary/j2sdk-image/bin/java: No such file or directory
=> Don't think it's related to this change. Launched another one below.

https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/3680/console
Waiting for next available executor on ‘ci.role.test&&hw.arch.x86&&sw.os.windows

x86-64_linux
sanity.perf
https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/3678/console
renaissance-scala-kmeans_0_PASSED
renaissance-scala-kmeans-windows_0_SKIPPED

Copy link
Contributor

@smlambert smlambert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @piyush286 - I'll merge once the Grinders finish running (have a great vacation)!

@smlambert
Copy link
Contributor

the relaunch on windows looks good https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/3681/ so will merge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants