Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Latest Trilead API plugin version 1.71.v9e7860a_67a_df fails when using JDK8 for agent process #3090

Closed
3 tasks done
dduportal opened this issue Aug 8, 2022 · 15 comments
Closed
3 tasks done

Comments

@dduportal
Copy link
Contributor

dduportal commented Aug 8, 2022

Service(s)

ci.jenkins.io, trusted.ci.jenkins.io

Summary

(This issue is written almost 1 week after the real problem)

The version of the plugin Trilead API was released the 2nd of August 2022: https://github.com/jenkinsci/trilead-api-plugin/releases/tag/1.71.v9e7860a_67a_df.

Once it was installed on the LTS Jenkins Controllers ci.jenkins.io and trusted.ci.jenkins.io, we immediatly saw the following error messages when connecting to SSH agents or cloning git repositories over SSH:

Caused by: java.lang.UnsupportedClassVersionError: com/trilead/ssh2/ServerHostKeyVerifier has been compiled by a more recent version of the Java Runtime (class file version 55.0), this version of the Java Runtime only recognizes class file versions up to 52.0

We realized that this error happens when the agents processes are running with a JDK (while our controller uses JDK11).
Even if this kind of setup (different JDK) should work, it is not recommended so the infra should be updated to always use JDK11 (with JDK11 controllers).

The infra team now has to ensure that all agent templates (cloud or permanent) of all controllers have an explicit JDK specified for agent process and cleans up any leftovers or unmanaged changes to get the following benefits:

  • Safer and more stable controller <-> agent remoting link (using same JDKs)
  • Avoid breaking again once we'll move controllers to JDK17
  • Improved platform management by avoiding manual changes

TODO:

  • Manage trusted-agent-1 JDK installation as code with Puppet
  • Ensure that all agents defined with JCasc in Puppet for ci.jenkins.io/trusted.ci.jenkins.io/cert.ci.jenkins.io are using an explicit JDK11 (eventually requires to add JDK11 along with JDK8 or JDK17 on Docker images, while VM templates already have the 3 JDKs installed)
  • Ensure that all agents defined with JCasc in Kubernetes Helm management for infra.ci.jenkins.io/release.ci.jenkins.io/weekly.ci.jenkins.io are using an explicit JDK11 (eventually requires to add JDK11 along with JDK8 or JDK17 on Docker images, while VM templates already have the 3 JDKs installed)

Reproduction steps

No response

@jglick
Copy link

jglick commented Aug 8, 2022

See jenkins-infra/update-center2#629 etc.

@dduportal
Copy link
Contributor Author

Currently checking ci.jenkins.io's Java configuration (extends to trusted.ci.jenkins.io to some extents):

  • Azure VM have a "javapath" field to specify the path of the java binary used for spawning the agent process. Tested manually and works very well (System properties of the agent reports JDK11 while the default mvn -v command reports JDK8)
  • For containers (both Azure Container Instances agents and Kubernetes agents), the environment variable JENKINS_JAVA_BIN must be set to the path of the java binary used for spawning the agent process on both Windows and Linux agents. It is a feature of the entrypoint script from the "inbound-agent" Docker images. Tested manually and works very well (System properties of the agent reports JDK11 while the default mvn -v command reports JDK8 for ACI, but no System properties for Kubernetes: had to run a script through the agent script console to run a "ps aux" command on the pod agent)
  • For EC2 it's trickier because there is no feature as I can tell:
    • For Linux templates, using the "agent prefix command", we can customize the PATH environment variable ("PATH=/opt/jdk-11/bin:$PATH") to ensure that the JDK11 takes precedences over any default java installation. Tested manually and works on both amd64 and arm64 Linux
    • For Windows templates, work in progress because we use the "unix" type of AMI to benefits from OpenSSH connection, but the prefix command has to be tried differently.

@dduportal
Copy link
Contributor Author

Opened 2 plugins issues for elements that were missing:

@dduportal
Copy link
Contributor Author

dduportal commented Aug 10, 2022

@dduportal
Copy link
Contributor Author

Side node: the manual experiments weren't fully cleaned up which led to aci-maven-8 failures as per #3094.

Reason is that I thought the ACI container were using fixed images (with shasum) for Windows container while they are not: https://github.com/jenkins-infra/jenkins-infra/blob/production/hieradata/common.yaml#L198 (compared to Linux images).

So after jenkins-infra/docker-inbound-agents#35 was merged (and deployed) to DockerHub, the manual setting JENKINS_JAVA_BIN= C:/openjdk-11/bin/java was suddenly wrong (path not existing).

Corrective measures applied:

@dduportal
Copy link
Contributor Author

Diff on trusted.ci.jenkins.io:

--- /var/lib/jenkins/casc.d/clouds.yaml 2022-08-08 08:10:30.656864026 +0000
+++ /tmp/puppet-file20220811-30324-1fj4ua0      2022-08-11 09:03:38.213529268 +0000
@@ -30,7 +30,8 @@
         installDocker: false
         installGit: false
         installMaven: false
-        javaPath: "java"
+        javaPath: "/opt/jdk-11/bin/java"
+        jvmOptions: "-XX:+PrintCommandLineFlags"
         labels: "ubuntu amd64 azure vm java linux docker maven-11 jdk11"
         location: "East US"
         noOfParallelJobs: 1
@@ -57,7 +58,7 @@
       existingResourceGroupName: "jenkinsinfra-trustedvmagents"
       vmTemplates:
       - agentLaunchMethod: "SSH"
-        agentWorkspace: "C:\\Jenkins"
+        agentWorkspace: "C:/Jenkins"
         credentialsId: "azure-jenkins-user"
         diskType: "managed"
         doNotUseMachineIfInitFails: true
@@ -77,7 +78,8 @@
         installDocker: false
         installGit: false
         installMaven: false
-        javaPath: "java"
+        javaPath: "C:/tools/jdk-11/bin/java"
+        jvmOptions: "-XX:+PrintCommandLineFlags"
         labels: "windows amd64 azure vm docker-windows"
         location: "East US"
         noOfParallelJobs: 1

🚀

@dduportal
Copy link
Contributor Author

  • Same diff for cert.ci.jenkins.io => 🚀
  • Confirmed that both controller can spawn agent successfully

Next step: ci.jenkins.io. Currently fixing minro hiccups underlined by the diff.

@dduportal
Copy link
Contributor Author

  • Applied on ci.jenkins.io with success

  • Checked the following container agents (e.g. agent process runs with JDK11 but mvn report the expected JDK8 or JDK17 by default for build):

    • Linux Container in Digital Ocean and maven 8: ✅
    • Linux Container in Digital Ocean and maven 17: ✅
    • Windows Container in Azure and maven-8: ✅
    • Linux Container in Amazon EKS and maven 8: ✅
    • Linux Container in Amazon EKS and maven 17: ✅
  • Checked the following VM agents (e.g. agent process runs with JDK11 and mvn reports JDK11 by default):

    • Azure Windows VM: ✅
    • Azure Linux VM: ✅
    • Azure Linux Highmem VM: ✅
    • AWS EC2 Linux VM: ✅
    • AWS EC2 Linux Highmem VM: ✅

@dduportal
Copy link
Contributor Author

Thanks to jenkins-infra/jenkins-infra#2321, the agent trusted-agent-1 is now managed (at least for its JDKs):

root@ip-172-31-5-190:~# /opt/jdk-8/bin/java -version
openjdk version "1.8.0_345"
OpenJDK Runtime Environment (Temurin)(build 1.8.0_345-b01)
OpenJDK 64-Bit Server VM (Temurin)(build 25.345-b01, mixed mode)
root@ip-172-31-5-190:~# /opt/jdk-11/bin/java -version
openjdk version "11.0.16" 2022-07-19
OpenJDK Runtime Environment Temurin-11.0.16+8 (build 11.0.16+8)
OpenJDK 64-Bit Server VM Temurin-11.0.16+8 (build 11.0.16+8, mixed mode)
root@ip-172-31-5-190:~# /opt/jdk-17/bin/java -version
openjdk version "17.0.4" 2022-07-19
OpenJDK Runtime Environment Temurin-17.0.4+8 (build 17.0.4+8)
OpenJDK 64-Bit Server VM Temurin-17.0.4+8 (build 17.0.4+8, mixed mode, sharing)

@dduportal
Copy link
Contributor Author

  • Applied on ci.jenkins.io with success

    • Checked the following container agents (e.g. agent process runs with JDK11 but mvn report the expected JDK8 or JDK17 by default for build):

      • Linux Container in Digital Ocean and maven 8: ✅
      • Linux Container in Digital Ocean and maven 17: ✅
      • Windows Container in Azure and maven-8: ✅
      • Linux Container in Amazon EKS and maven 8: ✅
      • Linux Container in Amazon EKS and maven 17: ✅
    • Checked the following VM agents (e.g. agent process runs with JDK11 and mvn reports JDK11 by default):

      • Azure Windows VM: ✅
      • Azure Linux VM: ✅
      • Azure Linux Highmem VM: ✅
      • AWS EC2 Linux VM: ✅
      • AWS EC2 Linux Highmem VM: ✅

This change caused #3096. Fixed now by 2 PRs on puppet.

@dduportal
Copy link
Contributor Author

Ref. jenkins-infra/kubernetes-management#2701 for Kubernetes

@dduportal
Copy link
Contributor Author

Proposal from @lemeurherve for the AWS EC2 Windows machines: adding a custom powershell script in the VM image template that would overides the default startup command (and accept the JDK bin path as argument) until the feature is implement in the ec2 plugin

@timja
Copy link
Member

timja commented Aug 16, 2022

Proposal from @lemeurherve for the AWS EC2 Windows machines: adding a custom powershell script in the VM image template that would overides the default startup command (and accept the JDK bin path as argument) until the feature is implement in the ec2 plugin

would it be less work to just add the feature you want? :shipit:

example PR for Azure: jenkinsci/azure-vm-agents-plugin#186

@dduportal
Copy link
Contributor Author

would it be less work to just add the feature you want? :shipit:

I agree with this statement. But worth mentioning the idea if short maintenance has to be done.

Anyway, closing the issue as the trilead issue was fixed.

lemeurherve pushed a commit to lemeurherve/kubernetes-management that referenced this issue Aug 18, 2022
lemeurherve added a commit to jenkins-infra/kubernetes-management that referenced this issue Aug 18, 2022
* fix: use ec2-plugin 'javaPath' option instead of command prefix

Should fix Windows (which can't have command prefix)

Ref:
- jenkinsci/ec2-plugin#766
- jenkins-infra/helpdesk#3090
- jenkins-infra/docker-jenkins-weeklyci#565

* fix: use slash instead of antislash for Windows paths

* add missing .exe extension for the Windows javaPath
@lemeurherve
Copy link
Member

Proposal from @lemeurherve for the AWS EC2 Windows machines: adding a custom powershell script in the VM image template that would overides the default startup command (and accept the JDK bin path as argument) until the feature is implement in the ec2 plugin

would it be less work to just add the feature you want? :shipit:

For the record, the feature has been added to ec2-plugin: https://github.com/jenkinsci/ec2-plugin/releases/tag/ec2-2.0.0

lemeurherve added a commit to lemeurherve/kubernetes-management that referenced this issue Aug 31, 2022
…ins-infra#2721)

* fix: use ec2-plugin 'javaPath' option instead of command prefix

Should fix Windows (which can't have command prefix)

Ref:
- jenkinsci/ec2-plugin#766
- jenkins-infra/helpdesk#3090
- jenkins-infra/docker-jenkins-weeklyci#565

* fix: use slash instead of antislash for Windows paths

* add missing .exe extension for the Windows javaPath
lemeurherve added a commit to lemeurherve/kubernetes-management that referenced this issue Sep 1, 2022
…ins-infra#2721)

* fix: use ec2-plugin 'javaPath' option instead of command prefix

Should fix Windows (which can't have command prefix)

Ref:
- jenkinsci/ec2-plugin#766
- jenkins-infra/helpdesk#3090
- jenkins-infra/docker-jenkins-weeklyci#565

* fix: use slash instead of antislash for Windows paths

* add missing .exe extension for the Windows javaPath
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants