-
-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New Machine requirement: Linux/x64 equinix dockerhost replacement #3352
Comments
System provisioned at skytap with 24 cores, 64Gb RAM, and a 256Gb filesystem on /var/lib/docker |
Not a problem - they're not restricted by default. I've connected a container for experiental purposes running Fedora 39 to jenkins and running an AQA run at https://ci.adoptium.net/job/AQA_Test_Pipeline/206 🤞🏻 |
Host machine has been tested with docker builds at https://ci.adoptium.net/job/build-scripts/job/jobs/job/jdk/job/jdk-linux-x64-temurin/471/console on dockerhost-skytap-ubuntu2204-x64-1 so I'll aim to get this activated properly for the weekend runs or on Monday, subject to there being no risk to any outstanding items in the release cycle. |
EDIT: extended grinder re-run stopped after 10 hours - trying at Grinder#8675 Others were ok. |
I've installed
|
The three executors are running build jobs that can each take quite a bit of space on the jenkins workspace sine the build volumes are mapped from the host. Also the installer generations can use quite a bit of space on the host workspace. See #3362 At present there are up to 6Gb (I think a full build of the latest release might take close to 10Gb) on various directories on the host file system.
I'm going to redo this file system with about 100Gb for |
Noting that the Fedora 39 container is working as well as most of the other systems as per adoptium/aqa-tests#5012 (comment) |
Noting that https://ci.adoptium.net/job/build-scripts/job/jobs/job/jdk17u/job/jdk17u-alpine-linux-x64-temurin/393/ and the equivalent on other versions appears to insist on running on one of the equinix dockerhosts at the moment as it's looking for
|
Inventory PR for this system: #3358 |
Nagios & Wazuh installed successfully. |
Note: I've added |
Initial machine is in place and working. While we may wish to add additional containers onto this machine that can be done at a later date so I shall close this. Noting that #3378 covers setting up a second machine for the same purpose. |
This machine was offline due to our monthly x64 credits at Skytap having expired. It has been changed from its original configuration to have 16GB RAM and six vCPUs and brought online again, but it still has a number of static docker containers defined. The machine has been up for 2 days, 7h01 (My working assumption is that the rollover date for the credits is on the month boundary, but that may not be true) and it's currently showing this: |
@Haroon-Khel I'm struggling to bring the machines back online - has the port information in the jenkins agent definitions become de-synchronised from what is on the host?
I've changed that particular agent definition to be on 32771 and it has come up ok but would be good to understand some of the others. I'd quite like to get at least one other container live on there (any more may cause a problem with the restricted number of CPU cores). |
Yeah Im seeing this in #3486 (comment) too. Not sure what caused docker to reassign ports. Looking into it |
Its caused because we now dont specify a port (allowing docker to randomly assign one), infrastructure/ansible/playbooks/AdoptOpenJDK_Unix_Playbook/roles/deploy_container/tasks/deploy.yml Line 27 in b728c86
Then when the dockerhost machine is restarted, docker will randomly assign a port again instead of giving the containers their previous port. TLDR a port needs to be specified on container startup instead of relying on docker to give a random one |
That's another thing that won't be a problem if we switch over the connecting the containers over JNLP ;-) |
The containers are back online (https://ci.adoptium.net/computer/test-docker-ubuntu2004-x64-4/log refuses to come back up for other reasons). The problem should not reoccur with the existing containers. I need to change infrastructure/ansible/playbooks/AdoptOpenJDK_Unix_Playbook/roles/deploy_container/tasks/deploy.yml Line 27 in b728c86
|
Sounds good thanks - Jenkins logs should be clearer now after today's cleanups. Need to wait for Ludovic to come back to fix the RISC-V ones but that should be another load of warnings to disappear from Jenkins 👍 |
Do you know what the reason is? It's "curious" to note that the port number is 32768, exactly 2^15 |
I'm going to close this now. Any future work can happen under other issues if required. |
https://ci.adoptium.net/computer/test-docker-ubuntu2004-x64-4/log is back online, I recreated its container and now the jenkins agent has no trouble connecting |
I need to request a new machine:
Please explain what this machine is needed for: Replacement for Equinix systems which we have to decommission as per #3292
The text was updated successfully, but these errors were encountered: