Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rbd: add a workaround to fix rbd snapshot scheduling #2647

Closed
wants to merge 3 commits into from

Conversation

Madhu-1
Copy link
Collaborator

@Madhu-1 Madhu-1 commented Nov 17, 2021

currently, we have a bug in the rbd mirror scheduling module. After doing failover and failback the scheduling is not getting updated and the mirroring snapshots are not getting created periodically as per the scheduling interval. This PR workaround this one by doing the below operations

  • Create a dummy (csi-vol-dummy-ceph fsID) image per cluster and this image should be easily identified.

  • During Promote operation on any image enables the mirroring on the dummy image. when we enable the mirroring on the dummy image the pool will get updated and the scheduling will be reconfigured.

  • During Demote operation on any image disables the mirroring on the dummy image. the disabled need to be done to enable the mirroring again when we get the promote request to make the image as the primary

  • When the DR is no more needed, this image needs to be manually cleanup for now as we don't want to add a check
    in the existing DeleteVolume code path for deleting dummy images as it impacts the performance of the DeleteVolume workflow.

Moved to add scheduling to the promote operation as scheduling need to be added when the image is promoted and this is the correct method of adding the scheduling to make the scheduling take place.

More details at https://bugzilla.redhat.com/show_bug.cgi?id=2019161
Signed-off-by: Madhu Rajanna madhupr007@gmail.com

@Madhu-1 Madhu-1 added the DNM DO NOT MERGE label Nov 17, 2021
@mergify mergify bot added component/rbd Issues related to RBD bug Something isn't working labels Nov 17, 2021
@Madhu-1
Copy link
Collaborator Author

Madhu-1 commented Nov 17, 2021

Manual testing works for me. For now added DNM as its getting tested by @ShyamsundarR and @BenamarMk

if err != nil {
return nil, status.Errorf(codes.Internal, "failed to get mirroring node %s", err.Error())
}
err = enableMirroringOnDummyImage(rbdVol, mode)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rbdVol is not the dummy volume. We will be enabling mirroring on a none dummy volume if am reading this correctly.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, enabling mirroring for the dummy volume should be done after we promote the real volume.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enableMirroringOnDummyImage takes the rbdVol as input and creates a new dummy struct and replaces the name and calls the mirror operation on that dummy image.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BenamarMk is it the same for disabling mirroring on dummy image also? it should be done after demoting the real image?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed for enabling. let me know your input for disable also.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disable is fine. YOu don't need to do anything to it. So basically the flow is like this:

  1. Promote
  2. enable dummy
    And then
  3. Disable dummy
  4. Demote

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, then the current changes are fine. Thanks for confirming @BenamarMk

Moved to add scheduling to the promote
operation as scheduling need to be added
when the image is promoted and this is
the correct method of adding the scheduling
to make the scheduling take place.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
added helper function to get the cluster ID.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
@@ -437,6 +529,30 @@ func (rs *ReplicationServer) PromoteVolume(ctx context.Context,
}
}

var mode librbd.ImageMirrorMode
Copy link

@BenamarMk BenamarMk Nov 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the whole block from 532 all the way to 540 be inside the block above? IOW, don't you need to put it at line 529?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for keeping it out of the loop is if the csidriver is restarted after promote and before enabling the mirroring on the dummy image on the next request the if check will be skipped as the image is already primary

Copy link

@BenamarMk BenamarMk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to me @Madhu-1. Thanks for coding it so quickly. I will now test it.

@Madhu-1
Copy link
Collaborator Author

Madhu-1 commented Nov 17, 2021

/retest ci/centos/upgrade-tests-cephfs

@Madhu-1
Copy link
Collaborator Author

Madhu-1 commented Nov 17, 2021

/retest ci/centos/upgrade-tests-cephfs

Nov 17 15:36:49.574: FAIL: failed to create user cephcsi-cephfs-node with error etcdserver: request timed out

@Madhu-1 Madhu-1 added the Priority-0 highest priority issue label Nov 18, 2021
@Madhu-1 Madhu-1 requested review from a team November 18, 2021 05:30
@humblec
Copy link
Collaborator

humblec commented Nov 18, 2021

One general question here @Madhu-1 , whlie the failback happen and if the dummy image exist, do we foresee any issues?

@Madhu-1
Copy link
Collaborator Author

Madhu-1 commented Nov 18, 2021

One general question here @Madhu-1 , whlie the failback happen and if the dummy image exist, do we foresee any issues?

Do you mean the dummy image created by the user? it's a workaround I don't think the user will create any real images with the same name for using it.

@humblec
Copy link
Collaborator

humblec commented Nov 18, 2021

One general question here @Madhu-1 , whlie the failback happen and if the dummy image exist, do we foresee any issues?

Do you mean the dummy image created by the user? it's a workaround I don't think the user will create any real images with the same name for using it.

not that, a dummy image exist in the cluster and during failover we create another one for second cluster, Now during failback, the first cluster's dummy image exist already or not cleaned up, Thinking that, I was asking the existence of dummy image in the first cluster can be a problem while the failback is established and images started to get promoted/demoted.

@Madhu-1
Copy link
Collaborator Author

Madhu-1 commented Nov 18, 2021

One general question here @Madhu-1 , whlie the failback happen and if the dummy image exist, do we foresee any issues?

Do you mean the dummy image created by the user? it's a workaround I don't think the user will create any real images with the same name for using it.

not that, a dummy image exist in the cluster and during failover we create another one for second cluster,

The dummy image is unique per cluster it's created with the fsid as the prefix.

Now during fallback, the first cluster's dummy image exist already or not cleaned up, Thinking that, I was asking the existence of dummy image in the first cluster can be a problem while the failback is established and images started to get promoted/demoted.

i don't see any problem here. during fallback the dummy image mirroring will be disabled we are not going to cleanup or do anything. we operator on the dummy images which is created for that cluster not the dummy image of the other cluster.

@humblec
Copy link
Collaborator

humblec commented Nov 18, 2021

One general question here @Madhu-1 , whlie the failback happen and if the dummy image exist, do we foresee any issues?

Do you mean the dummy image created by the user? it's a workaround I don't think the user will create any real images with the same name for using it.

not that, a dummy image exist in the cluster and during failover we create another one for second cluster,

The dummy image is unique per cluster it's created with the fsid as the prefix.

Yeah, thats correct, what I meant above was same, may be during failover.. was unwanted.. :)

Now during fallback, the first cluster's dummy image exist already or not cleaned up, Thinking that, I was asking the existence of dummy image in the first cluster can be a problem while the failback is established and images started to get promoted/demoted.

i don't see any problem here. during fallback the dummy image mirroring will be disabled we are not going to cleanup or do anything. we operator on the dummy images which is created for that cluster not the dummy image of the other cluster.

Sure, if we dont expect any problem in this scenario, we are good.

@humblec
Copy link
Collaborator

humblec commented Nov 18, 2021

lgtm.. considering the importance of this PR we have to take this in at the earliest and can do further enhancements later.. @Madhu-1 will we wait for testing result from @BenamarMk @ShyamsundarR before merge ?

CC @ceph/ceph-csi-contributors ptal

@Madhu-1
Copy link
Collaborator Author

Madhu-1 commented Nov 18, 2021

lgtm.. considering the importance of this PR we have to take this in at the earliest and can do further enhancements later.. @Madhu-1 will we wait for testing result from @BenamarMk @ShyamsundarR before merge ?

Yes waiting for the testing result. but want to close up on review as changes look good to @BenamarMk and tested locally.

CC @ceph/ceph-csi-contributors ptal

currently we have a bug in rbd mirror scheduling module.
After doing failover and failback the scheduling is not
getting updated and the mirroring snapshots are not
getting created periodically as per the scheduling
interval. This PR workarounds this one by doing below
operations

* Create a dummy (unique) image per cluster and this image
should be easily identified.

* During Promote operation on any image enable the
mirroring on the dummy image. when we enable the mirroring
on the dummy image the pool will get updated and the
scheduling will be reconfigured.

* During Demote operation on any image disable the mirroring
on the dummy image. the disable need to be done to enable
the mirroring again when we get the promote request to make
the image as primary

* When the DR is no more needed, this image need to be
manually cleanup as for now as we dont want to add a check
in the existing DeleteVolume code path for delete dummy image
as it impact the performance of the DeleteVolume workflow.

Signed-off-by: Madhu Rajanna <madhupr007@gmail.com>
if err != nil {
return nil, status.Errorf(codes.Internal, "failed to get mirroring mode %s", err.Error())
}
err = enableMirroringOnDummyImage(rbdVol, mode)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the second image that we promote, this enable will become a No-OP? IOW we need to disable/enable mirroring on the dummy image for each promote of a real image, such that the real image gets scheduled.

In the current form the second enable would become a no-OP is what I think. @BenamarMk ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, correct its a no-op. the mirroring is enabled on dummy image only once for all promote operations. The same goes for disable also

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I tested that and it worked. I will test it again to see if refresh happens when the dummy image is enabled for mirroring while it is already ENABLED. But if I am wrong, then we can always do this:

Promote PV:

1. disable mirroring Dummy
2. promote PV
3. enable mirroring Dummy

Demote PV:

1. disable mirroring Dummy
2. demote PV

@humblec
Copy link
Collaborator

humblec commented Nov 18, 2021

lgtm.. considering the importance of this PR we have to take this in at the earliest and can do further enhancements later.. @Madhu-1 will we wait for testing result from @BenamarMk @ShyamsundarR before merge ?

Yes waiting for the testing result. but want to close up on review as changes look good to @BenamarMk and tested locally.

CC @ceph/ceph-csi-contributors ptal

@Madhu-1 it looks like we are still testing this, However I have placed approval to avoid any unnecessary delay once we are done with the testing to merge.

Copy link
Contributor

@ShyamsundarR ShyamsundarR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few corner cases... hopefully this prevents the auto-merge

@humblec
Copy link
Collaborator

humblec commented Nov 18, 2021

There are a few corner cases... hopefully this prevents the auto-merge

@ShyamsundarR there is already DNM, so that should be enough to block the merge.

@Madhu-1
Copy link
Collaborator Author

Madhu-1 commented Nov 19, 2021

closing in favor of #2656

@Madhu-1 Madhu-1 closed this Nov 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working component/rbd Issues related to RBD DNM DO NOT MERGE Priority-0 highest priority issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants