-
Notifications
You must be signed in to change notification settings - Fork 106
X_Elasticsearch 5.6 backup instructions
These are the instructions to create a snapshot/backup and restore data for the legal API.
Look at the official docs elastic search on cloud.gov to check for changes and to add more context
This should only have to be run once, if it's never been run before.
- Configure repository:
cf run-task api --command "python manage.py configure_backup_repository" -m 4G --name es-backup-1
- Create a backup:
cf run-task api --command "python manage.py create_elasticsearch_backup" -m 4G --name es-backup-1
- Create a backup with a special name:
cf run-task api --command "python manage.py create_elasticsearch_backup -s my_test_backup" -m 4G --name es-backup-2
- Go to the latest backup:
cf run-task api --command "python manage.py restore_elasticsearch_backup" -m 4G --name es-backup-1
- Go to specific backup:
cf run-task api --command "python manage.py restore_elasticsearch_backup -s 20180725_archived_mur_reload_in_progress" -m 4G --name es-backup-2
Wait a couple minutes
Set up a service key or, you can set up the variables locally. You will need following variables:
export es_hostname=""
export es_port=""
export es_username=""
export es_password=""
export s3_region=""
export s3_bucket=""
export s3_access_key=""
export s3_secret_key=""
You can get the elastic search(es_) and s3 bucket(s3_) environment variables with
cf env api
You will want to use the api backup bucket in the environment.
Make sure you are not running elasticsearch locally, or you will get an address already in use
error.
After confirming are in the right environment, you are ready to open a connection to elasticsearch via ssh by running:
cf ssh api -L "9200:${es_hostname}:${es_port}"
Keep open this ssh session in a tab.
If this is the first time doing this or things were not set up with service keys, you will want to create a repository for elasticsearch. This will allow elasticsearch to connect to the bucket.
In another tab run the following command replace "my" with legal or eregs depending on which elasticsearch instance you are backing up:
curl -X PUT -u "${es_username}:${es_password}" "localhost:9200/_snapshot/my_s3_repository" -d @<(cat <<EOF
{
"type": "s3",
"settings": {
"bucket": "${s3_bucket}",
"region": "${s3_region}",
"access_key": "${s3_access_key}",
"secret_key": "${s3_secret_key}"
}
}
EOF
)
You can see all snapshots in legal by running:
curl -X GET -u "${es_username}:${es_password}" "localhost:9200/_snapshot/legal_s3_repository/_all" | python -m json.tool | less
After you set up your environment, and have the ssh session going, in a new tab create the snapshot that you can use as a backup with:
curl -X PUT -u "${es_username}:${es_password}" "localhost:9200/_snapshot/my_s3_repository/my_s3_snapshot"
The existing index can be delete using:
python manage.py delete_docs_index
You can restore from backup after you set up your environment, have a backup, and have the ssh session going. In a new tab, run:
curl -X POST -u "${es_username}:${es_password}" "localhost:9200/_snapshot/my_s3_repository/my_s3_snapshot/_restore" -d '{"indices": "docs"}'
You can check the status of the snapshot for errors with:
curl -X GET -u "${es_username}:${es_password}" "localhost:9200/_snapshot/_status"
Next you can check the result in the ssh session by looking up the uri with env
in your ssh session and running:
curl <uri>/docs/_count
You will get an all shards failed
error while the process is loading, but should be fine after it loads.
You will also need to reload current MURs and AO's in the likely case that any have been published or updated since the backup:
python manage.py refresh_current_legal_docs_zero_downtime
Once that looks good, check the data on the website.
Here is the more documentation on elasticsearch snapshots: https://www.elastic.co/guide/en/elasticsearch/reference/5.6/modules-snapshots.html