-
Notifications
You must be signed in to change notification settings - Fork 82
Recovery
This page describes how to recover a GD2 cluster from a complete cluster shutdown.
On a complete cluster shutdown, GD2 cannot startup as no etcd servers are available to initially connect to. To allow startup we need to identify and bring up the last known etcd servers first.
The procedure below is pretty complex right now. We will try to simplify it in further GD2 releases.
GD2 saves the etcd information in the store config file. This file is present at DATADIR/glusterd2/store.toml
. The usual paths are /var/lib/glusterd2/store.toml
or /usr/local/var/lib/glusterd2/store.toml
.
An example store.toml
file is below.
CAFile = ""
CURLs = ["http://0.0.0.0:2379"]
CertFile = ""
ClntCertFile = ""
ClntKeyFile = ""
ConfFile = "/var/lib/glusterd2/store.toml"
Dir = "/var/lib/glusterd2/store"
Endpoints = ["http://172.17.0.3:2379","http://172.17.0.4:2379","http://172.17.0.4:2379"]
KeyFile = ""
NoEmbed = false
PURLs = ["http://0.0.0.0:2380"]
UseTLS = false
On any of the GD2 peers in the cluster, identify the last known etcd servers from the store.toml
file. The last known servers are saved as Endpoints
in store.toml
.
In the above example the last known etcd servers are "http://localhost:2379"
. Keep note of this list.
On one of the last known etcd servers do the following to import old data.
NOTE: All of these must be done on only one of the peers that was an etcd server
- First take a backup of the old store data under
DATADIR/glusterd2/store
, and create an empty store directory.
mv /var/lib/glusterd2/store{,.bak}
mkdir /var/lib/glusterd2/store
- Recreate the etcd data dir
DATADIR/glusterd2/store/etcd.data
, in a single node mode. This requiresetcdctl
tool to be available. Get the NODEID from theDATADIR/glusterd2/uuid.toml
.
ETCDCTL_API=3 etcdctl snapshot restore ../store.bak/etcd.data/member/snap/db --name <NODEID> --initial-cluster <NODEID>=http://<PUBLIC_IP>:2380 --initial-advertise-peer-urls http://<PUBLIC_IP>:2380 --data-dir /var/lib/glusterd2/store/etcd.data --skip-hash-check
- Now, start GD2 in single node mode. To do this, just remove the
store.toml
file before starting GD2.
mv /var/lib/glusterd2/store.toml{,.bak}
glusterd2
# or systemctl start glusterd2
GD2 should now begin running on this node, with the data imported from the snapshot.
On every GD2 remaining peer, do the following.
- Edit
store.toml
and setEndpoints = ["<PUBLIC_IP of restored node>:2379"]
. - Create a backup of the
DATADIR/store
directory if present.
mv /var/lib/glusterd2/store{,.bak}
- Start GD2
glusterd2
# or systemctl start glusterd2
NOTE: Bring up the rest of the nodes one by one with a delay between each. This is to allow elasticetcd to safely catch all the new servers coming up and select etcd servers correctly.