From ef85494b278cb030700205d3e9d0235a0acd2d52 Mon Sep 17 00:00:00 2001 From: mtaggart13 Date: Wed, 13 Nov 2024 15:25:08 +0000 Subject: [PATCH 1/2] Updates to recovery.md to address PE-39730 https://perforce.atlassian.net/browse/PE-39730 outlines some required changes to the procedure for replacing a missing or failed replica Puppet server. This draft aims to address all issues raised in the ticket. --- documentation/recovery.md | 54 +++++++++++++++++++++++++++++---------- 1 file changed, 40 insertions(+), 14 deletions(-) diff --git a/documentation/recovery.md b/documentation/recovery.md index 3bb0d1aa..428556e9 100644 --- a/documentation/recovery.md +++ b/documentation/recovery.md @@ -14,14 +14,22 @@ The new system needs to be provisioned with the same certificate name as the sys This procedure uses the following placeholder references. * _\_ - The FQDN and certname of the primary Puppet server -* _\_ - The FQDN and certname of the replacement replica Puppet server -* _\_ - Either A or B; whichever of the two letter designations is appropriate for the server being replaced. It will be the opposite of the primary server. +* _\_ - The FQDN and certname of the old replica Puppet server that has failed or is missing +* _\_ - The FQDN and certname of the new replica Puppet server +* _\_ - The FQDN and certname of the original primary server that the old replica had replaced +* _\_ - Either A or B; whichever of the two letter designations is appropriate for the replacement server. It will be the opposite of the server that it is replacing. -1. Ensure the old replica server is forgotten. +1. If applicable, purge the failed primary server. (You may need to do this, for example, if the original primary failed and the promoted replica that replaced it has also failed.) - puppet infrastructure forget + puppet node purge -2. Install the Puppet agent on the replacement replica +2. Ensure the old replica server is forgotten. + + puppet infrastructure forget + +3. Install the Puppet agent on the replacement replica. + +**Note**: When designating the availability group of the replacement, use the opposite group (A or B) of the server being replaced. This means that, if the old replica server replaced the original primary server, the new replica is assigned the same availability group as the original primary. curl -k https://:8140/packages/current/install.bash \ | bash -s -- \ @@ -29,21 +37,39 @@ This procedure uses the following placeholder references. extension_requests:1.3.6.1.4.1.34380.1.1.9812=puppet/server \ extension_requests:1.3.6.1.4.1.34380.1.1.9813= + source /ect/profile.d/puppet-agent.sh + puppet agent -t -3. On the PE-PostgreSQL server in the _\_ group +4. Sign the certificate on the new primary server. + +5. On the PE-PostgreSQL server in the _\_ group 1. Stop puppet.service - 2. Add the following two lines to /opt/puppetlabs/server/data/postgresql/11/data/pg\_ident.conf + + puppet resource service puppet ensure=stopped + + 3. Add the following two lines to /opt/puppetlabs/server/data/postgresql/14/data/pg\_ident.conf pe-puppetdb-pe-puppetdb-map pe-puppetdb pe-puppetdb-pe-puppetdb-migrator-map pe-puppetdb-migrator - 3. Restart pe-postgresql.service -3. Provision the new system as a replica + 5. Restart pe-postgresql.service + + puppet resource service pe-postgresql ensure=stopped + puppet resource service pe-postgresql ensure=running + + 5. Run Puppet + + puppet agent -t + +6. Provision the new system as a replica puppet infrastructure provision replica --topology mono-with-compile --skip-agent-config --enable -4. On the PE-PostgreSQL server in the _\_ group, start puppet.service +7. On the PE-PostgreSQL server in the _\_ group, start puppet.service + + puppet resource service puppet ensure=running + ## Replace failed PE-PostgreSQL server (A or B side) @@ -102,11 +128,11 @@ On _\_: systemctl stop puppet -2. Add this line to /opt/puppetlabs/server/data/postgresql/11/data/pg\_ident.conf +2. Add this line to /opt/puppetlabs/server/data/postgresql/14/data/pg\_ident.conf replication-pe-ha-replication-map pe-ha-replication -3. Add these lines to /opt/puppetlabs/server/data/postgresql/11/data/pg\_hba.conf +3. Add these lines to /opt/puppetlabs/server/data/postgresql/14/data/pg\_hba.conf # REPLICATION RESTORE PERMISSIONS hostssl replication pe-ha-replication 0.0.0.0/0 cert map=replication-pe-ha-replication-map clientcert=1 @@ -123,13 +149,13 @@ Run the following commands. ``` systemctl stop puppet.service pe-postgresql.service -mv /opt/puppetlabs/server/data/postgresql/11/data/certs /opt/puppetlabs/server/data/pg_certs +mv /opt/puppetlabs/server/data/postgresql/14/data/certs /opt/puppetlabs/server/data/pg_certs rm -rf /opt/puppetlabs/server/data/postgresql/* runuser -u pe-postgres -- \ /opt/puppetlabs/server/bin/pg_basebackup \ - -D /opt/puppetlabs/server/data/postgresql/11/data \ + -D /opt/puppetlabs/server/data/postgresql/14/data \ -d "host= user=pe-ha-replication sslmode=verify-full From 9ed573e58e55d3c3186e0ac35af39f3d9325b6e3 Mon Sep 17 00:00:00 2001 From: mtaggart13 Date: Thu, 14 Nov 2024 10:25:46 +0000 Subject: [PATCH 2/2] Post-review updates to recovery.md Updates to draft following engineer review --- documentation/recovery.md | 43 ++++++++++++++++++++++----------------- 1 file changed, 24 insertions(+), 19 deletions(-) diff --git a/documentation/recovery.md b/documentation/recovery.md index 428556e9..282a09c1 100644 --- a/documentation/recovery.md +++ b/documentation/recovery.md @@ -7,7 +7,12 @@ The new system needs to be provisioned with the same certificate name as the sys ## Recover from failed primary Puppet server 1. Promote the replica ([official docs](https://puppet.com/docs/pe/2019.8/dr_configure.html#dr-promote-replica)) -2. Replace missing replica server (same as [Replace missing or failed replica Puppet server](#replace-missing-or-failed-replica-puppet-server) below) +2. Purge the failed primary server + + puppet node purge + + +3. Replace missing replica server (same as [Replace missing or failed replica Puppet server](#replace-missing-or-failed-replica-puppet-server) below) ## Replace missing or failed replica Puppet server @@ -16,20 +21,13 @@ This procedure uses the following placeholder references. * _\_ - The FQDN and certname of the primary Puppet server * _\_ - The FQDN and certname of the old replica Puppet server that has failed or is missing * _\_ - The FQDN and certname of the new replica Puppet server -* _\_ - The FQDN and certname of the original primary server that the old replica had replaced -* _\_ - Either A or B; whichever of the two letter designations is appropriate for the replacement server. It will be the opposite of the server that it is replacing. - -1. If applicable, purge the failed primary server. (You may need to do this, for example, if the original primary failed and the promoted replica that replaced it has also failed.) - - puppet node purge +* _\_ - Either A or B; whichever of the two letter designations is appropriate for the replacement server. It will be the opposite of the primary server. -2. Ensure the old replica server is forgotten. +1. Ensure the old replica server is forgotten. puppet infrastructure forget -3. Install the Puppet agent on the replacement replica. - -**Note**: When designating the availability group of the replacement, use the opposite group (A or B) of the server being replaced. This means that, if the old replica server replaced the original primary server, the new replica is assigned the same availability group as the original primary. +2. Install the Puppet agent on the replacement replica. curl -k https://:8140/packages/current/install.bash \ | bash -s -- \ @@ -41,18 +39,23 @@ This procedure uses the following placeholder references. puppet agent -t -4. Sign the certificate on the new primary server. +3. Sign the certificate on the primary server. + + puppetserver ca sign --certname -5. On the PE-PostgreSQL server in the _\_ group +4. On the PE-PostgreSQL server in the _\_ group 1. Stop puppet.service puppet resource service puppet ensure=stopped - 3. Add the following two lines to /opt/puppetlabs/server/data/postgresql/14/data/pg\_ident.conf + 3. Add the following two lines to /opt/puppetlabs/server/data/postgresql/__/data/pg_ident.conf + + where __ is the appropriate major version of PostgreSQL as detailed in [Component versions in recent PE releases](https://www.puppet.com/docs/pe/2023.8/component_versions_in_recent_pe_releases.html#pe-agent-server-components). For PE release 2023.8.0 the PostgreSQL version is 14. pe-puppetdb-pe-puppetdb-map pe-puppetdb pe-puppetdb-pe-puppetdb-migrator-map pe-puppetdb-migrator + 5. Restart pe-postgresql.service puppet resource service pe-postgresql ensure=stopped @@ -62,11 +65,11 @@ This procedure uses the following placeholder references. puppet agent -t -6. Provision the new system as a replica +5. Provision the new system as a replica puppet infrastructure provision replica --topology mono-with-compile --skip-agent-config --enable -7. On the PE-PostgreSQL server in the _\_ group, start puppet.service +6. On the PE-PostgreSQL server in the _\_ group, start puppet.service puppet resource service puppet ensure=running @@ -128,11 +131,13 @@ On _\_: systemctl stop puppet -2. Add this line to /opt/puppetlabs/server/data/postgresql/14/data/pg\_ident.conf +2. Add this line to /opt/puppetlabs/server/data/postgresql/__/data/pg_ident.conf + +where __ is the appropriate major version of PostgreSQL as detailed in [Component versions in recent PE releases](https://www.puppet.com/docs/pe/2023.8/component_versions_in_recent_pe_releases.html#pe-agent-server-components). For PE release 2023.8.0 the PostgreSQL version is 14. replication-pe-ha-replication-map pe-ha-replication -3. Add these lines to /opt/puppetlabs/server/data/postgresql/14/data/pg\_hba.conf +3. Add these lines to /opt/puppetlabs/server/data/postgresql/__/data/pg\_hba.conf # REPLICATION RESTORE PERMISSIONS hostssl replication pe-ha-replication 0.0.0.0/0 cert map=replication-pe-ha-replication-map clientcert=1 @@ -144,7 +149,7 @@ On _\_: On _\_: -Run the following commands. +Run the following commands (using the appropriate PostgreSQL version number) ``` systemctl stop puppet.service pe-postgresql.service