Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support preemption when the number of attachment volumes of a node reaches the upper limit #3211

Merged
merged 2 commits into from
Nov 21, 2023

Conversation

Lily922
Copy link
Contributor

@Lily922 Lily922 commented Nov 20, 2023

support feature described in #3191

Design:

  1. Add a workqueue for node resource :nodeQueue workqueue.RateLimitingInterface, add nodeName to the queue when handling a CSINode resource or Node resource change event.
    The reasons for this are as follows:

    • When Node and CSINode resources with the same name need to be processed at the same time, we need to ensure data consistency and ensure that the data is up to date, To do this, we need a mutex lock to ensure mutual exclusion of processing:
    CSINode resource event handle:
    1.  try to get mutex lock
    3.  get latest Node resource and latest CSINode resouce from informer cache
    4.  get allocatable resources (from node status and scinode spec)
    5.  add allocatable  resources to scheduler cache
    
    Node resource event handle:
    1.  try to get mutex lock
    2.  get latest Node resource and latest CSINode resouce from informer cache
    3.  get allocatable resources (from node status and scinode spec)
    4.  add allocatable  resources to scheduler cache
    

    By using workqueue, we can ignore the change events is from Node or CSINode

    AddNode, DeleteNode, UpdateNode Event:  --> add nodeName to workqueue
    AddCSINode, DeleteCSINode, UpdateCSINode Event:  --> add nodeName to workqueue
    
    process workequeue:
    1. get nodeName from queue
    2. get latest Node resource and latest CSINode resouce from informer cache
    3. get allocatable resources (from node status and scinode spec)
    4. add allocatable  resources to scheduler cache
    
    • When processing events, an error may occur and you need to try again, such as failure to get a storageclass resource(because of the resource has not been created yet or for other reasons ), this failure can be solved by retrying. By using workqueue, we can use AddRateLimited to add the resources that failed to be processed to workqueue and wait for retry processing
  2. Don't check whether the CSI volume type has a quantity limit while NewTaskInfo or SetNode, Add all csi volumes on CSINode to cache, because the storage may change from unattachable volumes to attachable volumes after the cache set up, in this case it is very difficult to refresh all task caches, like case:

    1. A pod using disk csi volume (driverName: disk.csi.io) is created, check the disk driver on CSINode has no `allocatable` parameter
    apiVersion: storage.k8s.io/v1
    kind: CSINode
       name: 192.168.1.1
    spec:
       drivers:
       - name: disk.csi.io
         nodeID: xxxxxx.xxxxxx.xxxxxxx.xxxxxx
    NewTaskInfo generates a taskInfo without no Resreq about `disk.csi.io` because no `allocatable` parameter on CSINode.
    2. The pod will be scheduled to 192.168.1.1 successfully
    3. The CSINode is updated, and disk driver has  `allocatable` parameter
    apiVersion: storage.k8s.io/v1
    kind: CSINode
       name: 192.168.1.1
    spec:
       drivers:
       - name: disk.csi.io
         nodeID: xxxxxx.xxxxxx.xxxxxxx.xxxxxx
         allocatable:
            count: 10
    4. Add disk allocate resource to Node cache 
        node 192.168.1.1 : allocatable 10 disk  
        node 192.168.1.1 : used 0 disk
        But actually one disk is used by pod in step 2, the used number should be 1 disk
    

@volcano-sh-bot volcano-sh-bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Nov 20, 2023
@volcano-sh-bot volcano-sh-bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Nov 20, 2023
@Lily922 Lily922 force-pushed the volumelimit branch 2 times, most recently from 844cacd to 016071d Compare November 20, 2023 03:32
@wangyang0616
Copy link
Member

Please associate with the corresponding issue and complete the description.

@william-wang
Copy link
Member

@Lily922 Please rebase and solve the confliction :)

pvcName = ephemeral.VolumeClaimName(pod, &vol)
isEphemeral = true
default:
// TODO: support Inline volumes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please mark the submitter information in todo. Such as: "@Lily922 TODO: support Inline volumes."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


csiSource := pv.Spec.PersistentVolumeSource.CSI
if csiSource == nil {
// TODO: support non-CSI volumes and migrating pvc
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

…aches the upper limit

Signed-off-by: lili <lili_9309@163.com>
@Lily922 Lily922 force-pushed the volumelimit branch 2 times, most recently from fa16e63 to 182acee Compare November 20, 2023 12:43
Signed-off-by: lili <lili_9309@163.com>
@wangyang0616
Copy link
Member

/lgtm

Copy link
Member

@william-wang william-wang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@volcano-sh-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: william-wang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@volcano-sh-bot volcano-sh-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 21, 2023
@volcano-sh-bot volcano-sh-bot merged commit 2a7d4b2 into volcano-sh:master Nov 21, 2023
13 checks passed
predicateStatus = append(predicateStatus, nodeVolumeStatus)
return predicateStatus, fmt.Errorf("plugin %s predicates failed %s", nodeVolumeLimitsCSIFilter.Name(), status.Message())
}
predicateStatus = append(predicateStatus, nodeVolumeStatus)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't retrun if failed here? see issue #3050

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants