Unexpected volume behaviors #11

rmadaka · 2018-09-25T13:15:21Z

-> Create two PVCs (PVC1, PVC2)
-> Mounted two PVCs to one app pod and ran I/O's on mount point.
-> Again above two PVCs mounted to 3 replica controller app pods and ran I/O's on both the mount points.
-> Deleted one replica controller app pod, rc app pod came up automatically with same mount points and no data loss found.
-> Then tried delete PVC1 which is in mounted state, PVC1 status went to terminating state.
-> Now deleted all app pods. then PVC1 deleted successfully.
-> After that one of my worker node went some bad state don't know the reason, which all pods are placed on that worker node went to below state.

NAME                                   READY     STATUS     RESTARTS   AGE
csi-attacher-glusterfsplugin-0         2/2       Running    0          3d
csi-nodeplugin-glusterfsplugin-btllk   2/2       Running    0          3d
csi-nodeplugin-glusterfsplugin-j6w8j   2/2       NodeLost   0          3d
csi-nodeplugin-glusterfsplugin-mvthq   2/2       Running    0          3d
csi-provisioner-glusterfsplugin-0      2/2       Unknown    0          3d
etcd-47vhqc75rl                        1/1       Running    0          1h
etcd-bvvmmb7kzn                        1/1       Unknown    0          4d
etcd-m4lskms5fb                        1/1       Running    0          4d
etcd-njxn5qsr7h                        1/1       Running    0          4d
etcd-operator-989bf8569-kctcd          1/1       Running    0          4d
glusterd2-cluster-2qcgw                1/1       Running    1          3d
glusterd2-cluster-59gjl                1/1       Running    1          3d
glusterd2-cluster-sqwrd                1/1       NodeLost   0          3d

-> Then i have rebooted the worker node, now node is up with proper condition, and which all pods are placed on this worker node came to running state.

NAME                                   READY     STATUS    RESTARTS   AGE
csi-attacher-glusterfsplugin-0         2/2       Running   0          3d
csi-nodeplugin-glusterfsplugin-btllk   2/2       Running   0          3d
csi-nodeplugin-glusterfsplugin-j6w8j   2/2       Running   2          3d
csi-nodeplugin-glusterfsplugin-mvthq   2/2       Running   0          3d
csi-provisioner-glusterfsplugin-0      2/2       Running   0          21m
etcd-47vhqc75rl                        1/1       Running   0          2h
etcd-m4lskms5fb                        1/1       Running   0          4d
etcd-njxn5qsr7h                        1/1       Running   0          4d
etcd-operator-989bf8569-kctcd          1/1       Running   0          4d
glusterd2-cluster-2qcgw                1/1       Running   1          3d
glusterd2-cluster-59gjl                1/1       Running   1          3d
glusterd2-cluster-sqwrd                1/1       Running   6          3d

-> Logged in to one of the gd2 pod and verified existing volume status. all bricks are in offline state.
-> Now verified PVC status

NAME                STATUS    VOLUME                 CAPACITY   ACCESS MODES   STORAGECLASS    AGE
glusterfs-csi-pv2   Bound     pvc-0f903cb5bfe111e8   2Gi        RWX            glusterfs-csi   1d

-> Then deleted PVC successfully.

persistentvolumeclaim "glusterfs-csi-pv2" deleted

No resources found.

-> Verified pv status, PV not deleted.

NAME                   CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS     CLAIM                       STORAGECLASS    REASON    AGE
pvc-0f903cb5bfe111e8   2Gi        RWX            Delete           Released   default/glusterfs-csi-pv2   glusterfs-csi             1d

-> Again logged into gd2 container, to verify volume exist or not .
-> volume is listing and volume state is STARTED state like below

+--------------------------------------+----------------------+-----------+---------+-----------+--------+
|                  ID                  |         NAME         |   TYPE    |  STATE  | TRANSPORT | BRICKS |
+--------------------------------------+----------------------+-----------+---------+-----------+--------+
| 9b92d49c-4480-487e-a307-786aea601af1 | pvc-0f903cb5bfe111e8 | Replicate | Started | tcp       | 3      |
+--------------------------------------+----------------------+-----------+---------+-----------+--------+

-> Now verified volume status, volume status is showing with bricks offline.

Volume : pvc-0f903cb5bfe111e8
+--------------------------------------+-------------+---------------------------------------------------------------------+--------+------+-----+
|               BRICK ID               |    HOST     |                                PATH                                 | ONLINE | PORT | PID |
+--------------------------------------+-------------+---------------------------------------------------------------------+--------+------+-----+
| 80f62a24-f830-46a0-84b5-865bf1304fe3 | 10.244.2.7  | /var/run/glusterd2/bricks/pvc-0f903cb5bfe111e8/subvol1/brick1/brick | false  |    0 |   0 |
| a912e11f-6e61-490f-b7a9-227ec11299d3 | 10.244.1.13 | /var/run/glusterd2/bricks/pvc-0f903cb5bfe111e8/subvol1/brick2/brick | false  |    0 |   0 |
| a3aefbae-a473-4ed3-afc5-473b23e74986 | 10.244.3.8  | /var/run/glusterd2/bricks/pvc-0f903cb5bfe111e8/subvol1/brick3/brick | false  |    0 |   0 |
+--------------------------------------+-------------+---------------------------------------------------------------------+--------+------+-----+

Here i am observing two things:

PVC is deleted but PV is not deleted from kubernetes.
Volume list is showing that volume in started state but volume status is showing all bricks are in offline state from gd2 container.

The text was updated successfully, but these errors were encountered:

JohnStrunk · 2018-09-25T18:34:06Z

Please check the logs from the provisioner csi-provisioner-glusterfsplugin-0. I suspect you will find errors from when it attempted to delete the PV. It may still be retrying as well.

Assuming you found errors, we should try and figure out why the bricks are offline. @kshlm may be able to help here.

rmadaka · 2018-09-26T10:14:07Z

@JohnStrunk yes, i am able to see below error repeatedly from csi-proviosioner logs.

E0925 12:45:35.500551 1 controller.go:1174] Error scheduling operaion "delete-pvc-0f903cb5bfe111e8[1ce7fda0-bfe1-11e8-aa4f-525400018951]": Failed to create operation with name "delete-pvc-0f903cb5bfe111e8[1ce7fda0-bfe1-11e8-aa4f-525400018951]". An operation with that name failed at 2018-09-25 12:43:35.586449494 +0000 UTC m=+10564.643238377. No retries permitted until 2018-09-25 12:45:37.586449494 +0000 UTC m=+10686.643238377 (2m2s). Last error: "rpc error: code = Internal desc = failed to stop volume node 00b3b28e-f945-4f54-8b28-5d3af879716c is probably down".

After some time started giving different repetitively.

0925 12:46:05.501060 1 controller.go:685] Exceeded failedDeleteThreshold threshold: 15, for volume "pvc-0f903cb5bfe111e8", provisioner will not attempt retries for this volume

From Gluster-provisioner logs able to see below error:

E0925 12:28:36.557480 1 utils.go:100] GRPC error: rpc error: code = Internal desc = failed to stop volume node 00b3b28e-f945-4f54-8b28-5d3af879716c is probably down

By seeing above errors, what i have observed is that, Its trying to stop the volume from this container 00b3b28e-f945-4f54-8b28-5d3af879716c which is already deleted.

Madhu-1 · 2018-09-26T10:57:51Z

@rmadaka can you paste the peer list output

I feel after PVC creation somewhere glusterd2 container got restart

NAME                                   READY     STATUS      RESTARTS   AGE
glusterd2-cluster-2qcgw                1/1       Running   1          3d
glusterd2-cluster-59gjl                1/1       Running   1          3d
glusterd2-cluster-sqwrd                1/1       Running   6          3d

I think due to the container restart newly created gd2 container is getting new IP and adding itself to the peer list with new IP address, and as keep-alive of previously running container will get expired and the node will be marked as offline or not online.
when CSI tries to stop the volume from glusterd2, as glusterd2 is not able to reach other peers with the old IP address it getting failed with below error message

E0925 12:28:36.557480 1 utils.go:100] GRPC error: rpc error: code = Internal desc = failed to stop volume node 00b3b28e-f945-4f54-8b28-5d3af879716c is probably down

rmadaka · 2018-09-26T11:04:00Z

@Madhu-1

+--------------------------------------+-------------------------+-------------------+-------------------+--------+-----+
|                  ID                  |          NAME           | CLIENT ADDRESSES  |  PEER ADDRESSES   | ONLINE | PID |
+--------------------------------------+-------------------------+-------------------+-------------------+--------+-----+
| 00b3b28e-f945-4f54-8b28-5d3af879716c | glusterd2-cluster-sqwrd | 127.0.0.1:24007   | 10.244.2.7:24008  | no     |     |
|                                      |                         | 10.244.2.7:24007  |                   |        |     |
| 125d50df-10fc-4b1d-97fb-ff0da39f5370 | glusterd2-cluster-59gjl | 127.0.0.1:24007   | 10.244.1.13:24008 | no     |     |
|                                      |                         | 10.244.1.13:24007 |                   |        |     |
| 1b099128-3374-4582-a692-1a94f69e874a | glusterd2-cluster-cbqbx | 127.0.0.1:24007   | 10.244.3.5:24008  | no     |     |
|                                      |                         | 10.244.3.5:24007  |                   |        |     |
| 2b6d33c7-64f7-4a65-a3b8-90dec34f39a2 | glusterd2-cluster-2qcgw | 127.0.0.1:24007   | 10.244.3.8:24008  | no     |     |
|                                      |                         | 10.244.3.8:24007  |                   |        |     |
| 56205100-47d2-4762-9706-c1094c2bff34 | glusterd2-cluster-4wkgd | 127.0.0.1:24007   | 10.244.1.12:24008 | no     |     |
|                                      |                         | 10.244.1.12:24007 |                   |        |     |
| 5e0eebab-3978-4a1d-b9e9-08bc7917a329 | glusterd2-cluster-sqwrd | 127.0.0.1:24007   | 10.244.2.13:24008 | yes    |  23 |
|                                      |                         | 10.244.2.13:24007 |                   |        |     |
| 7c217032-6c70-4964-8ec1-b25b006b4530 | glusterd2-cluster-59gjl | 127.0.0.1:24007   | 10.244.1.13:24008 | yes    |  23 |
|                                      |                         | 10.244.1.13:24007 |                   |        |     |
| 8dc69b03-d4f6-4b17-abea-4588da0ba844 | glusterd2-cluster-2qcgw | 127.0.0.1:24007   | 10.244.3.8:24008  | yes    |  24 |
|                                      |                         | 10.244.3.8:24007  |                   |        |     |
| 8f1df06d-131c-4827-9e01-0dc70e051901 | glusterd2-cluster-28tw6 | 127.0.0.1:24007   | 10.244.1.5:24008  | no     |     |
|                                      |                         | 10.244.1.5:24007  |                   |        |     |
| e6dffc73-4aa3-4f77-b999-341c10d5b126 | glusterd2-cluster-8gxnz | 127.0.0.1:24007   | 10.244.2.4:24008  | no     |     |
|                                      |                         | 10.244.2.4:24007  |                   |        |     |
+--------------------------------------+-------------------------+-------------------+-------------------+--------+-----+

Madhu-1 · 2018-09-26T11:29:18Z

Volume : pvc-0f903cb5bfe111e8
+--------------------------------------+-------------+---------------------------------------------------------------------+--------+------+-----+
|               BRICK ID               |    HOST     |                                PATH                                 | ONLINE | PORT | PID |
+--------------------------------------+-------------+---------------------------------------------------------------------+--------+------+-----+
| 80f62a24-f830-46a0-84b5-865bf1304fe3 | 10.244.2.7  | /var/run/glusterd2/bricks/pvc-0f903cb5bfe111e8/subvol1/brick1/brick | false  |    0 |   0 |
| a912e11f-6e61-490f-b7a9-227ec11299d3 | 10.244.1.13 | /var/run/glusterd2/bricks/pvc-0f903cb5bfe111e8/subvol1/brick2/brick | false  |    0 |   0 |
| a3aefbae-a473-4ed3-afc5-473b23e74986 | 10.244.3.8  | /var/run/glusterd2/bricks/pvc-0f903cb5bfe111e8/subvol1/brick3/brick | false  |    0 |   0 |
+--------------------------------------+-------------+---------------------------------------------------------------------+--------+------+-----+

due to gd2 container restart Host IP got changed as you can compare HOST in above response and PEER ADDRESSES in peer list response is not online , so for stopping the volume old IP's are not rechable

JohnStrunk · 2018-09-26T19:53:29Z

I had assumed we were still using net=host, but in checking the pod manifest, we're not 🎉 : https://github.com/gluster/gcs/blob/master/deploy/templates/gcs-manifests/gcs-gd2-cluster.yml.j2

So now we need to figure out the solution to the changing IP addresses. I'm wondering if we fix the UUID problem in #10 whether gd2 will update the IP addresses. @Madhu-1 any idea?

aravindavk · 2018-09-27T04:01:36Z

Cluster operations may work if IP changes and UUID remains same. But Glusterd2 should re-generate client Volfiles(and also Cluster Volfiles) and notify clients about the IP change.

volume gv1-client-0
    type protocol/client
    option remote-host <IP>
end-volume

@amarts @atinmu will client reconfigure client understands remote-host changes?

amarts · 2018-09-27T04:07:45Z

yes, it would allow the clients to get the new option, and reconfigure! Good to test it before saying it is ready!

Madhu-1 · 2018-09-27T04:41:24Z

Cluster operations may work if IP changes and UUID remains same.

but what happens for already mounted volumes [old volumes will be mounted with old IP's (which is already changed and not reachable) ]

a possible temporary solution to fix this problem is

Retain node UUID
Allow running glusterd2 containers on the host network

Madhu-1 · 2018-09-27T04:43:21Z

So now we need to figure out the solution to the changing IP addresses. I'm wondering if we fix the UUID problem in #10 whether gd2 will update the IP addresses. @Madhu-1 any idea?

we may face issue for already mounted volumes
if IP changes and UUID remains same, GD2 will add new IP to the ETCD but it wont delete old IP address

kshlm · 2018-09-27T05:11:29Z

Continuing on my from my comment in #10, we can easily get the peer-id persisted across restarts of the gd2 pods. So we'd need to ensure that we correctly regenerate volfiles and notify clients of the changes.

I'm thinking of an alternate approach to possibly solve this.

The problems being faced now is because GD2 is using IP addresses of the pods and because they could change. If instead we could set up fixed hostnames, and have GD2 use the hostnames instead, we could avoid this problem.

But this would require our deployment strategy to change.

Our current GD2 deployment uses DaemonSets (on all nodes currently, but we could easily select nodes on which to run when required). DaemonSets ensure that a configured pod is running on each selected node. But we do not have persistent hostnames, as the pods are launched with a different name on each restart. We can't also setup Services to point to individual DaemonSet pods, as the selector would match any DaemonSet pod.

What if, instead of the a DaemonSet, we deploy individual ReplicaSet/Deployment/StatefulSet for GD2 for each GCS node. The ReplicaSet/Deployment/Stateful set can be pinned to a specific node, to run just one copy of the GD2 pod. A Service can be created for each ReplicaSet/Deployment/StatefulSet, so we get a consistent hostname, which would not change on pod restarts. GD2 can be configured to use this hostname, and along with persistence of the peer-id, we'd technically not need to do anything in GD2 to handle restarts.

We could pretty easily do this with the current ansible based deployment. We'd need to create a kube manifest that would create the aforementioned Service and ReplicaSet/Deployment/Stateful, which can be converted into a template, that Ansible could delpoy for specific nodes.

Also, a note to everyone. The current Ansible based deployment isn't our end goal. The end goal for GCS is to use the Anthill operator to deploy GD2 and CSI drivers. This Ansible delpoyment, helps us try out different deployment strategies, and will help us make the right choices for Anthill.

aravindavk · 2018-09-27T05:23:24Z

but what happens for already mounted volumes [old volumes will be mounted with old IP's (which is already changed and not reachable) ]

Got it. Yeah it is a problem. But client can reconnect to other glusterd if backup volfile servers are configured. So reconfigure may work unless all glusterd2 IPs are changed.

JohnStrunk · 2018-09-27T18:06:16Z

But client can reconnect to other glusterd if backup volfile servers are configured. So reconfigure may work unless all glusterd2 IPs are changed.

I think this can be handled as long as the CSI driver uses the glusterd2-client Service. I've got an issue open over in CSI (gluster/gluster-csi-driver#14) that would have the driver use the service for provisioning and mounting.

It sounds like the CSI change + ensuring volfile gets regenerated would make us robust to pod IP change. Correct?

JohnStrunk · 2018-09-27T18:09:32Z

What if, instead of the a DaemonSet, we deploy individual ReplicaSet/Deployment/StatefulSet for GD2 for each GCS node.

@kshlm
I'm almost positive we're going to end up w/ a Deployment per GD2 pod in the end, so the Service per pod is a option if we must, but I think we'd be better off if we can push the volfile update to clients.

Using single-replica StatefulSets for each glusterd2 pod instead of a single DaemonSet, allows setting up of and use of pre-known hostnames as the listen address for glusterd2. The StatefulSets are pinned to individual nodes. Also, the glusterd2 pods are now deployed with 'emptyDir' volumes for /var/lib/glusterd2 which allows persistence of peerid. With the above 2 changes, glusterd2 pods survive pod restarts. Fixes gluster#10, gluster#11 Signed-off-by: Kaushal M <kshlmster@gmail.com>

Using single-replica StatefulSets for each glusterd2 pod instead of a single DaemonSet, allows setting up of and use of pre-known hostnames as the listen address for glusterd2. The StatefulSets are pinned to individual nodes. Also, the glusterd2 pods are now deployed with 'hostPath' volumes for /var/lib/glusterd2 which allows persistence of peerid. With the above 2 changes, glusterd2 pods survive pod restarts. Fixes gluster#10, gluster#11 Signed-off-by: Kaushal M <kshlmster@gmail.com>

Madhu-1 · 2018-10-09T08:22:07Z

@rmadaka this issue is fixed in the latest build. can you verify this one

rmadaka · 2018-10-10T11:25:24Z

Tested scenario with latest build
After reboot of GD2 pod, pod came up with same host-name and peer is in online state like below.

+--------------------------------------+---------+-----------------------------+-----------------------------+--------+-----+
|                  ID                  |  NAME   |      CLIENT ADDRESSES       |       PEER ADDRESSES        | ONLINE | PID |
+--------------------------------------+---------+-----------------------------+-----------------------------+--------+-----+
| 1ce6ab12-b980-4f04-a6d2-40c7914f6404 | kube3-0 | kube3-0.glusterd2.gcs:24007 | kube3-0.glusterd2.gcs:24008 | yes    |  21 |
| 7cb3b6ac-4198-4282-ab7f-6684423d09b0 | kube2-0 | kube2-0.glusterd2.gcs:24007 | kube2-0.glusterd2.gcs:24008 | yes    |  21 |
| a97667f1-1f28-40c7-99ca-18128f5ea029 | kube1-0 | kube1-0.glusterd2.gcs:24007 | kube1-0.glusterd2.gcs:24008 | yes    |  21 |
+--------------------------------------+---------+-----------------------------+-----------------------------+--------+-----+```

-> I have created PVC, before restarting GD2 pod. 
-> Then I have rebooted the gd2 pod .
-> Once GD2 pod is up and running,  I have logged in to GD2 .pod and verified volume status.
-> Rebooted GD2 pod brick is offline state.
```[root@kube1-0 /]# glustercli volume status --endpoints=http://10.233.12.208:24007/
Volume : pvc-e558aed6cc7b11e8
+--------------------------------------+-----------------------+---------------------------------------------------------------------+--------+-------+-----+
|               BRICK ID               |         HOST          |                                PATH                                 | ONLINE | PORT  | PID |
+--------------------------------------+-----------------------+---------------------------------------------------------------------+--------+-------+-----+
| 1247749e-4f2d-45ff-818b-a05e27e90daf | kube3-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-e558aed6cc7b11e8/subvol1/brick1/brick | true   | 49152 |  53 |
| ec1f7e9f-231a-4407-a5ee-cf83f2f2d22f | kube2-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-e558aed6cc7b11e8/subvol1/brick2/brick | true   | 49152 |  53 |
| ef1952cd-f2ac-43bf-9470-44594cdc853a | kube1-0.glusterd2.gcs | /var/run/glusterd2/bricks/pvc-e558aed6cc7b11e8/subvol1/brick3/brick | false  |     0 |   0 |
+--------------------------------------+-----------------------+---------------------------------------------------------------------+--------+-------+-----+

After 10 mints around we are not able to use this setup becaue of this issue #15

JohnStrunk · 2018-10-15T18:21:04Z

I think this issue has been solved, with remaining etcd problem tracked in #15. Please re-open if I am mistaken.

rmadaka · 2018-10-16T09:04:46Z

@JohnStrunk @Madhu-1 Once pvc is created, if i reboot any of the gd2 pod, Rebooted gd2 pod brick status going to offline state. Is it because this issue #15 . Rebooted gd2 pod bricks are not coming back to online when etcd pods are in running state as well

Madhu-1 · 2018-10-16T09:19:57Z

@rmadaka for brick are not in online state, already opened an issue glusterd2

Madhu-1 · 2018-10-26T08:40:14Z

@rmadaka glusterd2 issue for this one is closed now. can you verify this one?

atinmu added the bug Something isn't working label Sep 27, 2018

atinmu added this to the GCS-alpha1 milestone Sep 27, 2018

kshlm mentioned this issue Oct 1, 2018

deploy: Deploy glusterd2-cluster as individual StatefulSets #13

Merged

JohnStrunk added the GCS/alpha1 label Oct 3, 2018

JohnStrunk removed this from the GCS-alpha1 milestone Oct 3, 2018

JohnStrunk closed this as completed Oct 15, 2018

ghost removed the bug Something isn't working label Oct 15, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unexpected volume behaviors #11

Unexpected volume behaviors #11

rmadaka commented Sep 25, 2018

JohnStrunk commented Sep 25, 2018

rmadaka commented Sep 26, 2018

Madhu-1 commented Sep 26, 2018

rmadaka commented Sep 26, 2018

Madhu-1 commented Sep 26, 2018

JohnStrunk commented Sep 26, 2018

aravindavk commented Sep 27, 2018

amarts commented Sep 27, 2018

Madhu-1 commented Sep 27, 2018

Madhu-1 commented Sep 27, 2018

kshlm commented Sep 27, 2018

aravindavk commented Sep 27, 2018

JohnStrunk commented Sep 27, 2018

JohnStrunk commented Sep 27, 2018

Madhu-1 commented Oct 9, 2018

rmadaka commented Oct 10, 2018

JohnStrunk commented Oct 15, 2018

rmadaka commented Oct 16, 2018

Madhu-1 commented Oct 16, 2018 •

edited

Loading

Madhu-1 commented Oct 26, 2018

Unexpected volume behaviors #11

Unexpected volume behaviors #11

Comments

rmadaka commented Sep 25, 2018

JohnStrunk commented Sep 25, 2018

rmadaka commented Sep 26, 2018

Madhu-1 commented Sep 26, 2018

rmadaka commented Sep 26, 2018

Madhu-1 commented Sep 26, 2018

JohnStrunk commented Sep 26, 2018

aravindavk commented Sep 27, 2018

amarts commented Sep 27, 2018

Madhu-1 commented Sep 27, 2018

Madhu-1 commented Sep 27, 2018

kshlm commented Sep 27, 2018

aravindavk commented Sep 27, 2018

JohnStrunk commented Sep 27, 2018

JohnStrunk commented Sep 27, 2018

Madhu-1 commented Oct 9, 2018

rmadaka commented Oct 10, 2018

JohnStrunk commented Oct 15, 2018

rmadaka commented Oct 16, 2018

Madhu-1 commented Oct 16, 2018 • edited Loading

Madhu-1 commented Oct 26, 2018

Madhu-1 commented Oct 16, 2018 •

edited

Loading