-
Notifications
You must be signed in to change notification settings - Fork 993
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
add queue state management design proposal
- Loading branch information
Showing
2 changed files
with
239 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,239 @@ | ||
# Queue State Management | ||
|
||
[@sivanzcw](https://github.com/sivanzcw); Oct 17, 2019 | ||
|
||
## Table of Contents | ||
|
||
- [Queue State Management](#queue-state-management) | ||
* [Table of Contents](#table-of-contents) | ||
* [Motivation](#motivation) | ||
* [Function Detail](#function-detail) | ||
+ [Data Structure](#data-structure) | ||
+ [Queue State](#queue-state) | ||
+ [Queue Lifecycle Management](#queue-lifecycle-management) | ||
+ [Queue Status Refreshment](#queue-status-refreshment) | ||
+ [Queue Placement Restriction](#queue-placement-restriction) | ||
+ [Queue State on The Scheduling Process](#queue-state-on-the-scheduling-process) | ||
+ [Queue State on `vcctl`](#queue-state-on--vcctl-) | ||
|
||
## Motivation | ||
|
||
The queue is an object of resource management in the cluster and the cornerstone of resource scheduling, which is | ||
closely related to the allocation of resources and the scheduling of tasks. The resources under the cluster are | ||
allocated according to the `weight` ratio of the queue. The configuration of queue guarantees the number of cluster | ||
resources that tasks can use under the queue and limits the maximum resources that can be used. A single user or | ||
user group is correspond to one or more queues, which is assigned and determined by the administrator. When queues | ||
splitting cluster resources, single queue obtains the resource guarantees and quotas for using resources, so that uses | ||
or user groups under the queue have opportunity to use cluster resources, Simultaneously due to the resource limitation | ||
of queue, the ability of users or user groups to user cluster resources is limited to prevent cluster from being | ||
overwhelmed by a single user to deliver a large number or tasks, thereby ensuring the `multi-tenancy` feature of | ||
scheduling. When task is delivered, it will be placed to a specific queue and pod scheduling will by affected by queue | ||
priority and queue resource status. It is worth mentioning that the resource allocation of queue and limitation of | ||
queue resource can be dynamically adjusted. The queue can flexibly acquire remaining resources under cluster if there | ||
are idle resources, when a queue is busy, and there are idle resources under the cluster, the queue may break the | ||
original resource limit and try to occupy the remaining cluster resources. | ||
|
||
Based on the above description, it can be found that queue is a crucial object in the process of resource scheduling. | ||
There should have a complete guarantee mechanism to ensure the stability of queue without losing the flexibility of | ||
queue. Firstly, the queue should not be deleted arbitrarily, since if the queue is deleted, the unscheduled tasks in | ||
the queue will not be scheduled normally and the resources occupied by running tasks in the queue will not be normally | ||
counted. However, considering the flexibility of resource control, queue should not be forbidden to delete. In addition, | ||
considering the decisive role of queue in resource management, the administrator will control which user or user group | ||
can use cluster resources by controlling queue which also requires queue to provide corresponding capabilities. | ||
|
||
Therefore, we need to provide `State Management` capabilities for queue. Add the state configuration for queue and | ||
adjust capabilities of queue by judging the state of queue, thereby achieving the management of queue lifecycle and | ||
scheduling of tasks under the queue. | ||
|
||
## Function Detail | ||
|
||
### Data Structure | ||
|
||
Add `state` to `properties` in `spec` of CRD `queues.scheduling.sigs.dev`. The `state` of queue controller the status | ||
of queue. | ||
|
||
```go | ||
spec: | ||
properties: | ||
... | ||
|
||
state: | ||
type: string | ||
|
||
... | ||
``` | ||
|
||
Add `state` to `properties` in `status` of CRD `queues.scheduling.sigs.dev`. The `state` of queue display the status of | ||
current queue. | ||
|
||
```go | ||
status: | ||
properties: | ||
... | ||
|
||
state: | ||
type: string | ||
|
||
... | ||
``` | ||
### Queue State | ||
|
||
Valid queue state includes: | ||
|
||
* `Open`, indicates that the queue is available, the queue receives new task delivery | ||
* `Closed`, indicated that the queue is unavailable, the queue will wait for the subordinate tasks to gracefully exit, | ||
which does not mean that the system will actively delete tasks under the queue. However, the queue does not receive new | ||
task delivery | ||
* `Closing`, is a intermediate state between `Open` and `Closed`. When the state of queue is `Open` and there | ||
are tasks running or waiting to be scheduled under the queue. At this time, we try to change the state of queue to | ||
`Closed`. The state of queue will changes to `Closing` firstly and then changes to `Closed` when all the tasks under | ||
the queue exist. | ||
|
||
The ability of queue corresponding to queue state as show in the following table: | ||
|
||
| state | default | can be set | receive delivery | can be deleted | can be scheduled | deserved resources | | ||
| :-------: | :-----: | :--------: | :--------------: | :------------: |:---------------: | :----------------: | | ||
| `Open` | Y | Y | Y | N | Y | Normal | | ||
| `Closed` | N | Y | N | Y | Y | Normal | | ||
| `Closing` | N | N | N | N | Y | Normal | | ||
|
||
* If the state of queue is not specified during the creating of queue, the queue will use default state `Open` | ||
* When creating a new queue, the user can only specify `Open` or `Closed` state for queue | ||
* Only the queue with `Open` state accept new task delivery. the task will be rejected when it is posted to the queue | ||
with `Closed` or `Closing` state | ||
* Only the queue with `Closed` state can be deleted | ||
|
||
### Queue Lifecycle Management | ||
|
||
In the lifecycle management of queue, we need to guarantee the following three points: | ||
|
||
* When creating a new queue, if the user does not specify a state for queue, we need to specify default `Open` state | ||
for it, If the user specifies a state for queue, the specified state must be a valid value, valid values are `Open` | ||
and `Closed`. | ||
* When upgrading the queue, if state of queue changed, the specified state value must be valid. | ||
* when deleting the queue, only queue with `Closed` status can be deleted successfully. The `status` here is the `state` | ||
under the status of queue, not the `state` under the `spec` of queue. | ||
* `default` queue can not be deleted | ||
|
||
Add `validatingwebhookconfiguration` for queue validation during creating, updating or deleting of queue. | ||
|
||
```yaml | ||
apiVersion: admissionregistration.k8s.io/v1beta1 | ||
kind: ValidatingWebhookConfiguration | ||
metadata: | ||
name: {{ .Release.Name }}-validate-queue | ||
annotations: | ||
"helm.sh/hook": pre-install,pre-upgrade,post-delete | ||
webhooks: | ||
- clientConfig: | ||
caBundle: "" | ||
service: | ||
name: {{ .Release.Name }}-admission-service | ||
namespace: {{ .Release.Namespace }} | ||
path: /queues | ||
failurePolicy: Fail | ||
name: validatequeue.volcano.sh | ||
namespaceSelector: {} | ||
rules: | ||
- apiGroups: | ||
- "scheduling.sigs.dev" | ||
apiVersions: | ||
- "v1alpha2" | ||
operations: | ||
- CREATE | ||
- UPDATE | ||
resources: | ||
- queues | ||
``` | ||
Add implementation function `AdmitQueues` | ||
|
||
```go | ||
func AdmitQueues(ar v1beta1.AdmissionReview) *v1beta1.AdmissionResponse { | ||
... | ||
queue, err := DecodeQueue(ar.Request.Object, ar.Request.Resource) | ||
reviewResponse := v1beta1.AdmissionResponse{} | ||
validateQueue(queue, &reviewResponse) | ||
... | ||
} | ||
``` | ||
|
||
The above function will complete the following verification: | ||
|
||
* During creating or upgrading queue, verify the validity of the queue state | ||
* During deleting queue, check if queue can be deleted | ||
|
||
We need another `webhook` to set default state value for queue during queue creating, add `mutatingwebhookconfiguration` | ||
and `MutateQueues` function | ||
|
||
```yaml | ||
apiVersion: admissionregistration.k8s.io/v1beta1 | ||
kind: MutatingWebhookConfiguration | ||
metadata: | ||
name: {{ .Release.Name }}-mutate-queue | ||
annotations: | ||
"helm.sh/hook": pre-install,pre-upgrade,post-delete | ||
webhooks: | ||
- clientConfig: | ||
caBundle: "" | ||
service: | ||
name: {{ .Release.Name }}-admission-service | ||
namespace: {{ .Release.Namespace }} | ||
path: /mutating-queues | ||
failurePolicy: Fail | ||
name: mutatequeue.volcano.sh | ||
namespaceSelector: {} | ||
rules: | ||
- apiGroups: | ||
- "scheduling.sigs.dev" | ||
apiVersions: | ||
- "v1alpha2" | ||
operations: | ||
- CREATE | ||
resources: | ||
- queues | ||
``` | ||
|
||
```go | ||
func MutateQueues(ar v1beta1.AdmissionReview) *v1beta1.AdmissionResponse { | ||
... | ||
queue, err := DecodeQueue(ar.Request.Object, ar.Request.Resource) | ||
reviewResponse := v1beta1.AdmissionResponse{} | ||
createPatch(queue) | ||
... | ||
} | ||
``` | ||
|
||
### Queue Status Refreshment | ||
|
||
When refreshing the status of queue, the `state` value under `spec.properties` and podgroup condition under the queue will be | ||
considered: | ||
|
||
* If the `state` value is empty, the status of queue will be set as `Open` | ||
* If the `state` value is `Open`, then the status of queue will also be `Open` | ||
* If the `state` value is `Closed`, then we need to further consider whether there is a podgroup under the queue. if | ||
there is a podgroup under the queue, the status of the queue will be set as `Closing`, while if there is no podgroup | ||
under the queue, the status of queue will be set as `Closed`. | ||
|
||
### Queue Placement Restriction | ||
|
||
When creating job, we need to verify the status of queue specified by the job: | ||
|
||
* Allow job to be create, if the job does not specify a queue name | ||
* If the job specifies a queue name and the status of the queue is `Open`, the job is allowed to create | ||
* If the status of queue is not `Open`, the job creation request will be rejected. | ||
|
||
### Queue State on The Scheduling Process | ||
|
||
The above three states of queue have no effect on the existing scheduling process, for there is no pod under queue with | ||
`Closed` state, while pods under queues with `Open` or `Closing` state should be scheduled normally. | ||
|
||
### Queue State on `vcctl` | ||
|
||
We need to add support for `queue state management` in `vcctl`, mainly including the following changes: | ||
|
||
* Support for passing state of queue when creating queue | ||
* When getting queue detail or queue list, we need to display the status of the queue | ||
* Provide update function of queue, the function supports updating the `weight` or `state` of queue | ||
* Provide delete function of queue | ||
* Add queue operation interface, add `queue open` `queue close` `queue update` support |
File renamed without changes.