Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate postgresql database instance to new Aurora database instances #4297

Closed
6 tasks done
JonellaCulmer opened this issue Apr 15, 2020 · 2 comments
Closed
6 tasks done
Assignees
Milestone

Comments

@JonellaCulmer
Copy link
Contributor

JonellaCulmer commented Apr 15, 2020

Summary

After initial tests had been successfully performed by database team on the data loading (#4287), this work will serve dual purposes: help testing API and also make sure all data loading process works without issues. New monitoring scripts will be set up to the new aurora database to set the stage ready for the final switch.
We plan to do the actual migration/switch tasks on Friday 5/1/2020. The work are scheduled to be completed on Sunday 5/3/2020.

Completion criteria

  • Start initial setup of dev/stage/prod aurora database to prepare for the switch
  • stop all data loading/refresh process to postgresql databases instances
  • export data from current postgresql dev/stage/prod databases instances.
  • import data into new Aurora dev/stage/prod database instances
  • restart all data loading/refresh process
  • connect API server/CircleCI to the new Aurora dev/stage/prod database instances
@JonellaCulmer JonellaCulmer added this to the Sprint 12.2 milestone Apr 15, 2020
@fecjjeng fecjjeng self-assigned this Apr 21, 2020
@fecjjeng fecjjeng changed the title Perform data refresh to keep data in sync to the postgresql database Migrate postgresql database instance to new Aurora database instances Apr 28, 2020
@fecjjeng
Copy link
Contributor

fecjjeng commented Apr 28, 2020

Following is the outline of our check list of the migration/switch tasks:

STEP 1:
####################
create Aurora clusers for DEV/STG/PRD
Using teraform
PR 97 fecgov/fec-infrastructure#97 (prd) merged
PR 98 fecgov/fec-infrastructure#98 (stg) merged
PR 99 fecgov/fec-infrastructure#99 (dev) merged

STEP 2:
####################
create database in each master instance
drop database 'postgres' in each master instance (postgres is the default database name, easy target for hackers)

STEP 3:
####################
users/roles/permissions
####################

roles are per postgresql/aurora instance, only need to create once
create users/roles and permissions as in the current postgresql database instances.
reset the "utility user account" password to the current postgresql database instances.
reset individual accunts password upon request.
(NOTE: some query tool such as pgclient seems to have problem if password has special characters. DBeaver is ok. But just don't use special characters in password)
(NOTE: postgres and Aurora store username in lower case, if not included in double quotes. Mixed case username is more error prone and will be aviold)

STEP 4:
####################
create an ec2 server in AWS. Use it to perform the pg_dump and pg_restore work.
On local server, set up ssh key to connect to ec2 server in AWS

STEP 5:
####################
BEFORE pg_dump, STOP data input into the current Postgresql database
####################
stop GG jobs to the current database
Coordinate with Salient to
hold data input for Informatica,
real file
fecmur
aouser
Hold Informatica nightly process
comment out cronjobs that runs Oracle package/java programs for big tables change capture/data transfer
comment out monthly real file cleanup cronjob
MV refresh should be already done on Friday morning. No need to stop.
STEP 6:
####################
pg_dump
####################
nohup pg_dump -h <postgres-dev/stage/prod-hostname> -v -d <database_name> -U <user_name> -j 5 -F d -f /u07/pg-bkup/<dev/stage/prod>/<dev/stage/prod>

STEP 7:
####################
Pre-cleanup job if needed. Before doing restore
####################

BEFORE restore
####################
drop GG jobs IF there is any connect to the new Aurora database already
####################
-- in ggsci
-- NOTE: sourcedb, domain, useridalias to the new aurora databases need to be created
-- (Rohan/Jean had created on the 3 servers already)

dblogin sourcedb <sourcedb_name>, USERIDALIAS <user_id_alias>, DOMAIN <domain_name>
stop replicat <replicat_name>
drop replicat <replicat_name>

####################
Cleanup DB if previously used
###################
-- clean up objects in DB
or
-- in psql
drop database
create database

####################
drop replica before import (can be done the day before)
use default parameter group
####################
NOTE: remove replica to speed up the restore
NOTE: use default parameter group (the modified one log all transactions, which will generate too much log, also slow down restore)

STEP 8:
####################
Start import with 5 Jobs
####################

nohup pg_restore /u07/pg-bkup/<dev/stage/prod>/<dev/stage/prod> -h <aurora-dev/stage/prod-hostname>> -d -U --role -j 5 -v -c

####################
after import done
####################
review import logs, make sure error messages are acceptable (trying to drop non-existing objects, mv does not refresh
####################

STEP 9:
####################
complete refresh of MV
using script for regular refresh, not concurrenly. It is faster AND refresh concurrently does not work for MV that is populated yet.
####################

STEP 10:
####################
run analysis of the tables
####################

STEP 11:
####################
create replica, alter parameter group to allow more logging
####################
create replica,
alter parameter group to allow more logging

STEP 12:
####################
connect CircleCI
####################
Bastion Host itself does not need to be updated,
The parameters need to be replaced for each environment
(not included here)
####################

STEP 13:
####################
Hook-up API to the new Aurora database (from cloud foundry)
####################

After changed and uploaded these variables, go to CircleCI and rebuild the last successful build.
(Rebuild will result in no downtime)

NOTE: When we push api from Circle it builds all 3 parts of api (api/celery-beat/celery-worker)
In the case the the latest successful build is not avaiable,, we can use restage to make the new env take place. Just keep in mind that when we restage, we, need to restage all 3 parts of api (api/celery-beat/celery-worker)
cf restage api
cf restage celery-beat
cf restage celery-worker

STEP 14:
####################
create or re-hook GG job
####################

STEP 15:
####################
point java script to the new Aurora database
####################
point java script to the new Aurora database
start cronjob

STEP 16:
####################
contact Salient to start all data input work, pointing to the new Aurora database
####################

STEP 17:
####################
start Informatica Nightly process
####################

STEP 18:
####################
drop replica instance for postgresql DEV/STG/PRD
####################

STEP 19:
####################
after two sprints running without error
drop mater instance for postresql DEV/STG/PRD
####################
#4328

@fecjjeng
Copy link
Contributor

fecjjeng commented May 5, 2020

Databases migration completed by database team.
Followup ticket for tear down postgresql datbases opened (#4328).
Close this ticket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants