Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade Docker from 19 > 20 on Tugboat server #7667

Closed
2 tasks done
ElijahLynn opened this issue Jan 19, 2022 · 14 comments
Closed
2 tasks done

Upgrade Docker from 19 > 20 on Tugboat server #7667

ElijahLynn opened this issue Jan 19, 2022 · 14 comments
Assignees
Labels
CMS Team CMS Product team that manages both editor exp and devops DevOps CMS team practice area Platform CMS Team

Comments

@ElijahLynn
Copy link
Contributor

ElijahLynn commented Jan 19, 2022

Overview

Docker needs to be updated on Tugboat.

Reason for Change: PHP 7.4-bullseye image doesn't work (see #6216) and needs Docker 20.10+ to work properly.

We don't have to do this directly with the Tugboat application upgrade in

Acceptance Criteria:

  • DNS & Cert upgrade impact(s) documented on this issue
  • Upgrade Docker from 19 > 20 on Tugboat server

Implementation notes:

Can test Tugboat by creating an AMI and I think we have a test.tugboat.vfs.va.gov DNS entry already, not sure about cert but maybe?

@ElijahLynn ElijahLynn added DevOps CMS team practice area Needs refining Issue status labels Jan 19, 2022
@ElijahLynn
Copy link
Contributor Author

ElijahLynn commented Jan 19, 2022

Tugboat images (https://docs.tugboat.qa/reference/tugboat-images/#php) are based on the official PHP images (https://github.com/docker-library/php/tree/master/7.4) and we currently use the tugboatqa/php:7.3.28-apache image which is based on Alpine. (the debian images tags are suffixed with -buster and -bullseye, which are not declaring > https://github.com/TugboatQA/dockerfiles/blob/main/php/TAGS.md)

php:
# Use PHP 7.x with Apache; this syntax pulls in the latest version of PHP 7
# Pinning to 7.3.28 until we update from Docker 19 to Docker 20
# See https://github.com/department-of-veterans-affairs/va.gov-cms/issues/6216
image: tugboatqa/php:7.3.28-apache

So the comment in the code is still valid.

@ElijahLynn
Copy link
Contributor Author

#6216 (comment)

The reason, I think/it seems, for this is that the latest Apache image depends on Alpine Linux which recently introduced a commit from Moby that has a dependency on Docker 20.10.0 AND ibseccomp 2.4.4.

Turns out we aren't actually using Alpine but the -bullseye/debian image but it appears that the Moby <> Docker 20.10 + ibseccomp 2.4.4 issue is still present in the Debian image.

@q0rban
Copy link

q0rban commented Jan 19, 2022

Here's the version we're currently running:

$ dpkg --status docker-ce | grep Version
Version: 5:20.10.12~3-0~debian-buster

@cweagans
Copy link
Contributor

What happens if we use the 7.4-apache-buster tag? Buster is older. Maybe it would work with an older docker? We should still definitely do this upgrade, but I'm wondering if we can sidestep it for the time being.

@ndouglas
Copy link
Contributor

ndouglas commented Jan 20, 2022

@cweagans Great idea! It appears to be working -- it's progressed pretty far in the build past bullseye, anyway 😃

@jkalexander7
Copy link

@jkalexander7 jkalexander7 removed the Needs refining Issue status label Jan 20, 2022
@EWashb EWashb added the CMS Team CMS Product team that manages both editor exp and devops label Jan 23, 2023
@EWashb
Copy link
Contributor

EWashb commented Jan 30, 2023

Something to consider going forward: infrastructure automation issue written and linked below

@ElijahLynn
Copy link
Contributor Author

ElijahLynn commented Feb 8, 2023

I'm creating an AMI to test this on, I chose delete on termination for the two extra volumes. I also enabled "no reboot".

image.png

@ElijahLynn
Copy link
Contributor Author

Living documentation/progress thread https://dsva.slack.com/archives/CT4GZBM8F/p1675967534648249

@ElijahLynn
Copy link
Contributor Author

ElijahLynn commented Feb 11, 2023

Sweet, we have test.tugboat.vfs.va.gov mostly up and running now with https://github.com/department-of-veterans-affairs/devops/pull/12545.

DNS is working, cert is working, balancer and target group is working! This screenshot may not look awesome, but it is, and it is a good thing!

image

Next step is to configure Tugboat to know that it lives at test.tugboat.vfs.va.gov.

@ElijahLynn
Copy link
Contributor Author

ElijahLynn commented Feb 11, 2023

test.tugboat.vfs.va.gov is ALIVE!!! Gonna test the system and Docker upgrade on Monday!

image

@ElijahLynn
Copy link
Contributor Author

ElijahLynn commented Feb 14, 2023

K, I've tested out the upgrade in depth today and have come to the conclusion that we are going to need to suspend/stop all running previews before the upgrade. We are also going to wait until after demo tomorrow.

So the current upgrade steps are going to be, which I am going to test one more time tomorrow morning:

  1. Run Tugboat Mongo back to S3 job here http://jenkins.vfs.va.gov/job/utility/job/tugboat-backup/

  2. Manually add a 443 balancer listener rule (match hostnames on existing rule) exiwith top priority to serve a custom response of "Tugboat update in progress from X to X" and I think it should be for at least one hour, possibly 2 hours, during business hours, tomorrow after demo. So that would be from 2-4pm PT/5-7pm ET. This will serve the following to type of message to all users of Tugboat:
    image
    image

  3. Then, as root run these two commands in different terminals, they will produce a large amount of output and we basically just want to watch when the activity dies down, which will take minutes:

    1. tail -f /var/log/messages
    2. tail -f /var/log/docker
    3. docker ps # this will hang for 10-15 minutes on a tugboat-test clone until the above logs quiet down (lots of XFS (dm-52): Mounting V5 Filesystem, Starting recovery (logdev: internal), Ending recovery (logdev: internal)). Wait until this command exits successfully. Wait until the "dm-**" gets to 120 or so, which I think is the amount of previews. Again, that is only for the tugboat-test stuff, because we clone the 4TB docker volume without shutting the instance down.
  4. Loop through all previews and tugboat suspend <ID>. Watch the logs above to die down and the UI to know when all previews are suspended.

    PREVIEWS=$(tugboat list previews -j | grep '"preview":' | awk '{print $2}' | tr -d ',' | tr -d '"')
    for PREVIEW in $PREVIEWS; do (tugboat suspend $PREVIEW &) ; done
    
  5. Verify only Tugboat containers are still running: docker ps
    image

  6. tbctl stop

  7. Then, yum update -y | tee tugboat-yum-update-2023.02.13.log. Wait for it to complete and then ~5 minutes for the above logs to die down.

  8. Lastly, reboot.

  9. Verify instances start by launching some preview links.

@EWashb EWashb mentioned this issue Feb 14, 2023
48 tasks
@ElijahLynn
Copy link
Contributor Author

ElijahLynn commented Feb 15, 2023

W00t! We are now at:

docker --version
Docker version 20.10.13, build a224086

I am working on some docs to push to https://github.com/department-of-veterans-affairs/va.gov-cms/tree/main/READMES/devops.

@ElijahLynn
Copy link
Contributor Author

ElijahLynn commented Feb 15, 2023

I don't think I am going to get the docs pushed tonight and I think we should make a new issue for cleaning that up.

FWIW: I do have most everything documented here but it is messy right now, but good news is that it is all there and I won't forget it.

I will also say that we now have a TLS certs setup and DNS for test.tugboat.vfs.va.gov and a way to reliably test major system updates for Tugboat, even though it is still a manual process for now.

I made a stub issue here that needs refinement. #12609

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CMS Team CMS Product team that manages both editor exp and devops DevOps CMS team practice area Platform CMS Team
Projects
None yet
Development

No branches or pull requests

6 participants