Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor signing/assemble code in openjdk_build_pipeline.groovy to support Windows/docker builds #1117

Merged
merged 20 commits into from
Nov 6, 2024

Conversation

sxa
Copy link
Member

@sxa sxa commented Sep 26, 2024

There's a lot in here, so apologies in advance to the reviewers who will need a bit of time to review this properly ;-) Noting that there are some SXAEC: references in here which are things that still need to be cleaned up but the overall architecture and flow is now functional as per the builds in https://ci.adoptium.net/job/build-scripts/job/jobs/job/jdk21u/job/windbld which are now getting through all the steps and running tests.

Some things done in this PR:

  • A new batOrSh function that runs commands under the windows batch scripts if we're on windows instead of always using bash as this was causing some instability (issue 3714 listed below)
  • Split out some parts of buildScripts into separate functions: buildScriptsAssemble and buildScriptsEclipseSigner
  • Each of those two are now listed as a separate "stage" in the pipelines, so we can see how long each part takes. The original sign phase has been renamed as sign tgz/zip to avoid ambiguity. I'm tempted to change assemble to assemble & SBoM (Noting that it also includes the du -k of the output from make-adopt-build-farm.sh which is slow - at least 10 minutes - on my current test machine)
  • There are a few openjdk_build_pipeline: eyecatchers added so we can more easily find the separate sections while looking at the build logs (curl -s the consoleText link and grep it for _pipeline: - similar to the build.sh: lines from the temurin-build repo)
  • The setting up of envVars has been extracted into the top level build() and passed into the buildScripts() and buildScriptsAssemble() functions that use it
  • There is an 'ls -l' of various directories which forces shortname creation. It's a bit of a hack but the shortnames in containers is a bit hit or miss and this makes it reliable ... Without it it won't find various bits of the compiler tools
  • I've left openjdk_build_dir_arg (obsoleted in Remove use of --user-openjdk-build-root-directory now JDK-8326685 backported #1084) commented out but I'm still tempted to just remove it now ...
  • Initial stash of files for the internal signing phase now restricts the number of files copied across as **/* was unnecessary.

There are still some issues with the workspace cleaning options which may need to be addressed, although that can be done in subsequent PRs. e.g. rm -rf c:/workspace/openjdk-build/cyclonedx-lib c:/workspace/openjdk-build/security and rm -rf ' + context.WORKSPACE + '/workspace/target/*. Some additional work will be needed before the clean options will work. Ref: adoptium/infrastructure#3723

Fixes adoptium/infrastructure#3709
Supercedes #1103 (as this includes those changes)
Fixes adoptium/infrastructure#3714
Note: There is some potential follow-on work that could be done to tidy up this groovy script overall in #1116

Existing windows pipeline

image

New pipeline:

image

Copy link

Thank you for creating a pull request!

Please check out the information below if you have not made a pull request here before (or if you need a reminder how things work).

Code Quality and Contributing Guidelines

If you have not done so already, please familiarise yourself with our Contributing Guidelines and Code Of Conduct, even if you have contributed before.

Tests

Github actions will run a set of jobs against your PR that will lint and unit test your changes. Keep an eye out for the results from these on the latest commit you submitted. For more information, please see our testing documentation.

In order to run the advanced pipeline tests (executing a set of mock pipelines), it requires an admin to post run tests on this PR.
If you are not an admin, please ask for one's attention in #infrastructure on Slack or ping one here.
To run full set of tests, use "run tests"; a subset of tests on specific jdk version, use "run tests quick 11,21"

@sxa
Copy link
Member Author

sxa commented Sep 26, 2024

@eclipse-temurin-bot
Copy link
Collaborator

 PR TESTER RESULT 

❎ Some pipelines failed or the job was aborted! ❎
See the pipeline-build-check below for more information...

@sxa
Copy link
Member Author

sxa commented Sep 26, 2024

run tests

(Triggered https://ci.adoptium.net/job/build-scripts-pr-tester/job/openjdk-build-pr-tester/1922/ with refactored cleanWS post build)

@eclipse-temurin-bot
Copy link
Collaborator

 PR TESTER RESULT 

❎ Some pipelines failed or the job was aborted! ❎
See the pipeline-build-check below for more information...

@sxa
Copy link
Member Author

sxa commented Sep 26, 2024

run tests

(This one is not using the cached download so has a chance of being a valid test :-) https://ci.adoptium.net/job/build-scripts-pr-tester/job/openjdk-build-pr-tester/1923/ )

@eclipse-temurin-bot
Copy link
Collaborator

 PR TESTER RESULT 

❎ Some pipelines failed or the job was aborted! ❎
See the pipeline-build-check below for more information...

@sxa
Copy link
Member Author

sxa commented Sep 26, 2024

run tests

(enableSigner was deleted from some invocations of buildScripts - linter didn't seem to pick it up though 🤔 New run: https://ci.adoptium.net/job/build-scripts-pr-tester/job/openjdk-build-pr-tester/1924 )

@eclipse-temurin-bot
Copy link
Collaborator

 PR TESTER RESULT 

❎ Some pipelines failed or the job was aborted! ❎
See the pipeline-build-check below for more information...

@sxa
Copy link
Member Author

sxa commented Sep 26, 2024

run tests

(Running at https://ci.adoptium.net/job/build-scripts-pr-tester/job/openjdk-build-pr-tester/1926/ since some re-runs using the mechanism described in #1057 (comment) first to verify that it should start testing my code once it gets to that point)

@eclipse-temurin-bot
Copy link
Collaborator

 PR TESTER RESULT 

❎ Some pipelines failed or the job was aborted! ❎
See the pipeline-build-check below for more information...

@sxa sxa force-pushed the refactor_signing branch 2 times, most recently from 9d749be to 6c889b5 Compare September 27, 2024 16:23
@sxa
Copy link
Member Author

sxa commented Sep 27, 2024

run tests

@eclipse-temurin-bot
Copy link
Collaborator

 PR TESTER RESULT 

✅ All pipelines passed! ✅

@sxa
Copy link
Member Author

sxa commented Sep 28, 2024

Ref: https://ci.adoptium.net/job/build-scripts/job/jobs/job/jdk21u/job/windbld/782/
Current build with CLEAN_WORKSPACE_BUILD_OUTPUT_ONLY_AFTER is giving:

00:28:32  ===RELEASE FILE GENERATED===
00:28:32  mv: cannot stat '/cygdrive/c/workspace/openjdk-build/workspace/target//configure.txt': No such file or directory

https://ci.adoptium.net/job/build-scripts/job/jobs/job/jdk21u/job/windbld/760/ without that option ran through ok

Diff of file listings between the 21.0.5+8-ea build which we have published and the one from this PR in the windbld job
$ diff latest.ls.s windbld.ls.s
1,42d0
< jdk-21.0.5+8/bin/api-ms-win-core-console-l1-1-0.dll
< jdk-21.0.5+8/bin/api-ms-win-core-console-l1-2-0.dll
< jdk-21.0.5+8/bin/api-ms-win-core-datetime-l1-1-0.dll
< jdk-21.0.5+8/bin/api-ms-win-core-debug-l1-1-0.dll
< jdk-21.0.5+8/bin/api-ms-win-core-errorhandling-l1-1-0.dll
< jdk-21.0.5+8/bin/api-ms-win-core-fibers-l1-1-0.dll
< jdk-21.0.5+8/bin/api-ms-win-core-file-l1-1-0.dll
< jdk-21.0.5+8/bin/api-ms-win-core-file-l1-2-0.dll
< jdk-21.0.5+8/bin/api-ms-win-core-file-l2-1-0.dll
< jdk-21.0.5+8/bin/api-ms-win-core-handle-l1-1-0.dll
< jdk-21.0.5+8/bin/api-ms-win-core-heap-l1-1-0.dll
< jdk-21.0.5+8/bin/api-ms-win-core-interlocked-l1-1-0.dll
< jdk-21.0.5+8/bin/api-ms-win-core-libraryloader-l1-1-0.dll
< jdk-21.0.5+8/bin/api-ms-win-core-localization-l1-2-0.dll
< jdk-21.0.5+8/bin/api-ms-win-core-memory-l1-1-0.dll
< jdk-21.0.5+8/bin/api-ms-win-core-namedpipe-l1-1-0.dll
< jdk-21.0.5+8/bin/api-ms-win-core-processenvironment-l1-1-0.dll
< jdk-21.0.5+8/bin/api-ms-win-core-processthreads-l1-1-0.dll
< jdk-21.0.5+8/bin/api-ms-win-core-processthreads-l1-1-1.dll
< jdk-21.0.5+8/bin/api-ms-win-core-profile-l1-1-0.dll
< jdk-21.0.5+8/bin/api-ms-win-core-rtlsupport-l1-1-0.dll
< jdk-21.0.5+8/bin/api-ms-win-core-string-l1-1-0.dll
< jdk-21.0.5+8/bin/api-ms-win-core-synch-l1-1-0.dll
< jdk-21.0.5+8/bin/api-ms-win-core-synch-l1-2-0.dll
< jdk-21.0.5+8/bin/api-ms-win-core-sysinfo-l1-1-0.dll
< jdk-21.0.5+8/bin/api-ms-win-core-timezone-l1-1-0.dll
< jdk-21.0.5+8/bin/api-ms-win-core-util-l1-1-0.dll
< jdk-21.0.5+8/bin/api-ms-win-crt-conio-l1-1-0.dll
< jdk-21.0.5+8/bin/api-ms-win-crt-convert-l1-1-0.dll
< jdk-21.0.5+8/bin/api-ms-win-crt-environment-l1-1-0.dll
< jdk-21.0.5+8/bin/api-ms-win-crt-filesystem-l1-1-0.dll
< jdk-21.0.5+8/bin/api-ms-win-crt-heap-l1-1-0.dll
< jdk-21.0.5+8/bin/api-ms-win-crt-locale-l1-1-0.dll
< jdk-21.0.5+8/bin/api-ms-win-crt-math-l1-1-0.dll
< jdk-21.0.5+8/bin/api-ms-win-crt-multibyte-l1-1-0.dll
< jdk-21.0.5+8/bin/api-ms-win-crt-private-l1-1-0.dll
< jdk-21.0.5+8/bin/api-ms-win-crt-process-l1-1-0.dll
< jdk-21.0.5+8/bin/api-ms-win-crt-runtime-l1-1-0.dll
< jdk-21.0.5+8/bin/api-ms-win-crt-stdio-l1-1-0.dll
< jdk-21.0.5+8/bin/api-ms-win-crt-string-l1-1-0.dll
< jdk-21.0.5+8/bin/api-ms-win-crt-time-l1-1-0.dll
< jdk-21.0.5+8/bin/api-ms-win-crt-utility-l1-1-0.dll
44a3
> jdk-21.0.5+8/bin/concrt140.dll
104a64,67
> jdk-21.0.5+8/bin/msvcp140_1.dll
> jdk-21.0.5+8/bin/msvcp140_2.dll
> jdk-21.0.5+8/bin/msvcp140_atomic_wait.dll
> jdk-21.0.5+8/bin/msvcp140_codecvt_ids.dll
120c83
< jdk-21.0.5+8/bin/ucrtbase.dll
---
> jdk-21.0.5+8/bin/vccorlib140.dll
$ 

When running outside docker (i.e. replicating the normal build) I'm getting a problem during the unstash of the results of the signing phase:

[Pipeline] stash
13:10:58  Stashed 6564 file(s)
[Pipeline] }
[Pipeline] // node
[...]
[Pipeline] { (assemble)
[Pipeline] unstash
[Pipeline] }
[...]
13:11:10  Execution error: java.io.IOException: Failed to extract signed_jmods.tar.gz

@sxa sxa force-pushed the refactor_signing branch 2 times, most recently from 81504fc to 7ede372 Compare October 1, 2024 14:53
andrew-m-leonard and others added 5 commits October 31, 2024 10:16
Signed-off-by: Andrew Leonard <anleonar@redhat.com>
Co-authored-by: Andrew Leonard <31470007+andrew-m-leonard@users.noreply.github.com>
…tium#1127) (adoptium#1128)

* jdk-23.0.1 release jdkBranch GA tag check using wrong repo



* jdk-23.0.1 release jdkBranch GA tag check using wrong repo



---------

Signed-off-by: Andrew Leonard <anleonar@redhat.com>
Signed-off-by: Stewart X Addison <sxa@redhat.com>
@sxa sxa force-pushed the refactor_signing branch from f2a08e1 to 951fd04 Compare October 31, 2024 10:16
@github-actions github-actions bot added code-tools Issues that are miscellaneous enhancements or bugs with our utilities that assist our build scripts generation Issues that provide enhancements or fixes to the job generators labels Oct 31, 2024
@sxa
Copy link
Member Author

sxa commented Oct 31, 2024

force push to resolve perceived conflict

@sxa
Copy link
Member Author

sxa commented Oct 31, 2024

run tests

@sxa sxa requested a review from andrew-m-leonard October 31, 2024 10:19
@eclipse-temurin-bot
Copy link
Collaborator

 PR TESTER RESULT 

✅ All pipelines passed! ✅

@@ -2024,6 +2158,9 @@ class Build {
context.node(label) {
addNodeToBuildDescription()
// Cannot clean workspace from inside docker container
if ( buildConfig.TARGET_OS == 'windows' && buildConfig.DOCKER_IMAGE ) {
context.bat('rm -rf c:/workspace/openjdk-build/cyclonedx-lib c:/workspace/openjdk-build/security')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems a little odd, why specific c:\workspace and why just cyclonedx-lib and security ?
presumably permissions? Why can't we just do rm -rf ' + context.WORKSPACE + '/workspace

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the minimal that was required to clear up the workspace to allow the git clean operation to succeed without failures as I recall. I would definitely like to improve the way this is done, but those two directories specifically caused a problem (potentially because they are written to during the build, and that happens inside the docker container under different permissions?) I would strongly support a separate issue being raised to look at whether we can avoid writing into that directory and/or move/change the scope of this particular rm operation. We do have various worksapce clean-up options on the build to clear things up, but unless that is done at the correct time (as per the comment above the bit you've highlighted) it doesn't work in the container case.

That was a longer answer that I planned, but let me know if you're ok with deferring that to an action after this PR, or if you want it addressed before you feel able to approve it. I don't like it, but I'm also conscious of the fact that the PR as-is has undergone a good amount of testing for now and there are multiple cases that would have to be tested to fully test additional changes here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have done some tests without this rm and so far not reproduced the failure but I suspect it's an edge case on certain failures which triggered the value to reset the workspace so I'm still more comfortable leaving this in if possible. But definitely worth keeping an eye on so I could add some debug in to keep an eye on it and see later on if it might have failed without it (I'm running with that in another branch for the testing)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok makes sense as an edge case, but can we use:

'rm -rf ' + context.WORKSPACE + '/cyclonedx-lib'
'rm -rf ' + context.WORKSPACE + '/security'

Copy link
Member Author

@sxa sxa Nov 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to change it, but any particular reason? Just feels more efficient to spawn one shell and rm process instead of two.
Edit: Ah your main point is to use the workspace variable. I'll test that today

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At that point WORKSPACE is resolving to c:\workspace\workspace\build-scripts\jobs\jdk21u\... instead of the "shortname" so it would clean up the wrong directory (Also it's using \ instead of / which is likely to cause problems)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I can get it to use that variable if I move the workspace definition earlier in the build() function and remove the multiple definitions, then wrapping this part in context.ws(workspace)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed the change and seems to work in the docker environment on initial tests - LMK if moving the scope of the variable is likely to cause any side-effects e.g. on other platforms..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me check, didn't realize this was out of scope of the context.ws()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah looks good now, buildScripts() is run within the context.ws()...

pipelines/build/common/openjdk_build_pipeline.groovy Outdated Show resolved Hide resolved
…directories

Signed-off-by: Stewart X Addison <sxa@redhat.com>
@sxa
Copy link
Member Author

sxa commented Nov 6, 2024

run tests

Copy link
Contributor

@steelhead31 steelhead31 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reviewed this to the best of my knowledge and ability, given the scope of change, and the extensive testing, I'm happy to approve.

@sxa
Copy link
Member Author

sxa commented Nov 6, 2024

I've reviewed this to the best of my knowledge and ability, given the scope of change, and the extensive testing, I'm happy to approve.

Thanks - at this point I'll take that ;-) We'll find out if there are any side effects from the EA runs that'll happen overnight. (I'll merge once the pipeline tester completes but given it's through on a couple of versions I'm not concerned)

@eclipse-temurin-bot
Copy link
Collaborator

 PR TESTER RESULT 

❎ Some pipelines failed or the job was aborted! ❎
See the pipeline-build-check below for more information...

@sxa
Copy link
Member Author

sxa commented Nov 6, 2024

❎ Some pipelines failed or the job was aborted! ❎
See the pipeline-build-check below for more information...

Failure was in the Alpine build where it had a connectivity problem on dockerhost-azure-ubuntu2204-x64-1:

12:43:05  Compiling 4 properties into resource bundles for jdk.httpserver
12:43:05  Compiling 11 properties into resource bundles for jdk.jartool
12:43:05  Compiling up to 71 files for COMPILE_CREATE_SYMBOLS
12:43:05  Connection attempt failed: Connection refused
12:43:05  Compiling 11 properties into resource bundles for jdk.management.agent
12:43:05  Connection attempt failed: Connection refused
12:43:05  Compiling 4 properties into resource bundles for jdk.jdi
12:43:05  Connection attempt failed: Connection refused
[...]
12:43:17  Compiling up to 3493 files for java.base
12:43:18  Connection attempt failed: Connection refused
12:43:18  Giving up
12:43:18  IOException caught during compilation: Could not connect to server after 10 attempts with timeout 4000

Unclear on the cause but a subsequent run on the same machine suffered a similar fate. Running a subsequent one (build 74) on another machine did not show a problem, and running two builds with the existing pipelines (not via the PR tester) also passed: xlinux jdk21 and Alpine jdk21. It is assumed that this is a machine specific issue and that this PR is not the cause.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arm code-tools Issues that are miscellaneous enhancements or bugs with our utilities that assist our build scripts docker generation Issues that provide enhancements or fixes to the job generators jenkins-pipeline mac testing windows
Projects
Status: Done
7 participants