Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rbd: support QoS based on capacity for rbd volume #5016

Open
wants to merge 1 commit into
base: devel
Choose a base branch
from

Conversation

YiteGu
Copy link
Member

@YiteGu YiteGu commented Dec 12, 2024

  1. QoS provides settings for rbd volume read/write iops and read/write bandwidth.
  2. All QoS parameters are placed in the SC, send QoS parameters from SC to Cephcsi through PVC create request.
  3. We need provide QoS parameters in the SC as below:
    • BaseReadIops
    • BaseWriteIops
    • BaseReadBytesPerSecond
    • BaseWriteBytesPerSecond
    • ReadIopsPerGB
    • WriteIopsPerGB
    • ReadBpsPerGB
    • WriteBpsPerGB
    • BaseVolSizeBytes
      There are 4 base qos parameters among them, when users apply for a volume capacity equal to or less than BaseVolSizebytes, use base qos limit. For the portion of capacity exceeding BaseVolSizebytes, QoS will be increased in steps set per GB. If the step size parameter per GB is not provided, only base QoS limit will be used and not associated with capacity size.
  4. If PVC has resize request, adjust the QoS limit according to the QoS parameters after resizing.

Describe what this PR does

Provide some context for the reviewer

Is there anything that requires special attention

Do you have any questions?

Is the change backward compatible?

Are there concerns around backward compatibility?

Provide any external context for the change, if any.

For example:

  • Kubernetes links that explain why the change is required
  • CSI spec related changes/catch-up that necessitates this patch
  • golang related practices that necessitates this change

Related issues

Mention any github issues relevant to this PR. Adding below line
will help to auto close the issue once the PR is merged.

Fixes: #issue_number

Future concerns

List items that are not part of the PR and do not impact it's
functionality, but are work items that can be taken up subsequently.

Checklist:

  • Commit Message Formatting: Commit titles and messages follow
    guidelines in the developer
    guide
    .
  • Reviewed the developer guide on Submitting a Pull
    Request
  • Pending release
    notes

    updated with breaking and/or notable changes for the next major release.
  • Documentation has been updated, if necessary.
  • Unit tests have been added, if necessary.
  • Integration tests have been added, if necessary.

Show available bot commands

These commands are normally not required, but in case of issues, leave any of
the following bot commands in an otherwise empty comment in this PR:

  • /retest ci/centos/<job-name>: retest the <job-name> after unrelated
    failure (please report the failure too!)

@YiteGu YiteGu force-pushed the rbd-qos-support branch 2 times, most recently from 8a47010 to c07a41a Compare December 12, 2024 10:20
@YiteGu YiteGu changed the title cephcsi: support QoS based on capacity for rbd volume rbd: support QoS based on capacity for rbd volume Dec 12, 2024
@mergify mergify bot added the component/rbd Issues related to RBD label Dec 12, 2024
@YiteGu YiteGu force-pushed the rbd-qos-support branch 3 times, most recently from 63ea87d to f60f7c3 Compare December 12, 2024 12:06
Copy link
Member

@nixpanic nixpanic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! This is really a nice feature to have. A couple of comments that need to be addressed to improve the maintainability and usability.

  • move all qos related functions, structs, constants to a new qos.go file
  • all new functions that receive a *rbdVolume as argument should become a function of the rbdVolume struct (see example for the SetQOS function)
  • new parameters for the StorageClass need to be added to the documentation
  • documentation should explain that the QoS settings are currently only used in combination with the NBD mounter
  • ideally add unit tests for functions that do not interact with a Ceph cluster
  • ideally add some form or e2e testing, maybe set QoS parameters in the StorageClass, and verify that those are set on an RBD-image?

Thanks again, this is a great start!

internal/rbd/controllerserver.go Show resolved Hide resolved
internal/rbd/controllerserver.go Outdated Show resolved Hide resolved
internal/rbd/controllerserver.go Outdated Show resolved Hide resolved
internal/rbd/controllerserver.go Outdated Show resolved Hide resolved
internal/rbd/rbd_util.go Outdated Show resolved Hide resolved
internal/rbd/rbd_util.go Outdated Show resolved Hide resolved
internal/rbd/rbd_util.go Outdated Show resolved Hide resolved
internal/rbd/controllerserver.go Outdated Show resolved Hide resolved
internal/rbd/rbd_util.go Outdated Show resolved Hide resolved
@nixpanic nixpanic added the enhancement New feature or request label Dec 13, 2024
@YiteGu YiteGu force-pushed the rbd-qos-support branch 10 times, most recently from 2b895ad to 31ce867 Compare December 18, 2024 08:29
@YiteGu
Copy link
Member Author

YiteGu commented Dec 18, 2024

/test ci/centos/mini-e2e/k8s-1.31

internal/rbd/qos_test.go Outdated Show resolved Hide resolved
@@ -424,6 +430,19 @@ func (cs *ControllerServer) CreateVolume(
return nil, err
}

// Apply Qos parameters to rbd image.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CreateVolume function is already at the limit of the complexity (number of if/else and loop statements). Maybe it makes sense to move these two parts to the createBackingImage() function?

https://github.com/ceph/ceph-csi/actions/runs/12390057314/job/34584308943?pr=5016#step:3:841

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I full agree.

if err != nil {
log.DebugLog(ctx, "failed apply QOS for rbd image: %v", err)

return nil, status.Error(codes.Internal, err.Error())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note that a CSI function like CreateVolume needs to return a status.Error() or status.Errorf() generated type of error. Functions like createBackingImage() should return a 'normal' error. This is not in-sync with the error returned from SaveQOS() in the call below.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Get

@nixpanic
Copy link
Member

Looks quite complete, many thanks for adding the unit and e2e tests!

@nixpanic
Copy link
Member

/test ci/centos/mini-e2e/k8s-1.31

@nixpanic
Copy link
Member

/test ci/centos/mini-e2e/k8s-1.31

not sure why that did not work, but the CI is running now (takes ~2 hours to complete): https://jenkins-ceph-csi.apps.ocp.cloud.ci.centos.org/job/mini-e2e_k8s-1.31/283/display/redirect

@YiteGu
Copy link
Member Author

YiteGu commented Dec 18, 2024

Looks quite complete, many thanks for adding the unit and e2e tests!

Write it briefly to verify that e2e testing is available. More scenes need to be added. :-)

@YiteGu YiteGu force-pushed the rbd-qos-support branch 2 times, most recently from 4d90640 to 0c09729 Compare December 19, 2024 11:05
@YiteGu YiteGu force-pushed the rbd-qos-support branch 2 times, most recently from 47bd9ab to 795ea78 Compare December 19, 2024 11:30
@nixpanic
Copy link
Member

/test ci/centos/mini-e2e/k8s-1.31

@YiteGu YiteGu force-pushed the rbd-qos-support branch 2 times, most recently from efeffcd to bd4d17e Compare December 20, 2024 04:23
1. QoS provides settings for rbd volume read/write iops
   and read/write bandwidth.
2. All QoS parameters are placed in the SC,
   send QoS parameters from SC to Cephcsi through PVC create request.
3. We need provide QoS parameters in the SC as below:
   - BaseReadIops
   - BaseWriteIops
   - BaseReadBytesPerSecond
   - BaseWriteBytesPerSecond
   - ReadIopsPerGB
   - WriteIopsPerGB
   - ReadBpsPerGB
   - WriteBpsPerGB
   - BaseVolSizeBytes
   There are 4 base qos parameters among them, when users apply for
   a volume capacity equal to or less than BaseVolSizebytes, use base
   qos limit. For the portion of capacity exceeding BaseVolSizebytes,
   QoS will be increased in steps set per GB. If the step size parameter
   per GB is not provided, only base QoS limit will be used and not associated
   with capacity size.
4. If PVC has resize request, adjust the QoS limit
   according to the QoS parameters after resizing.

Signed-off-by: Yite Gu <guyite@bytedance.com>
@YiteGu
Copy link
Member Author

YiteGu commented Dec 20, 2024

/test ci/centos/mini-e2e/k8s-1.31

1 similar comment
@YiteGu
Copy link
Member Author

YiteGu commented Dec 20, 2024

/test ci/centos/mini-e2e/k8s-1.31

@YiteGu YiteGu requested a review from nixpanic December 20, 2024 09:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/rbd Issues related to RBD enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants