From d1d3ff6a6dd90352df32f43d0523ec9ac903440d Mon Sep 17 00:00:00 2001 From: Zhang Jinghui Date: Thu, 31 Oct 2019 18:51:27 +0800 Subject: [PATCH] add queue state management design proposal --- docs/design/queue/queue-state-management.md | 239 ++++++++++++++++++++ docs/design/{ => queue}/queue.md | 0 2 files changed, 239 insertions(+) create mode 100644 docs/design/queue/queue-state-management.md rename docs/design/{ => queue}/queue.md (100%) diff --git a/docs/design/queue/queue-state-management.md b/docs/design/queue/queue-state-management.md new file mode 100644 index 0000000000..3a15e31bdb --- /dev/null +++ b/docs/design/queue/queue-state-management.md @@ -0,0 +1,239 @@ +# Queue State Management + +[@sivanzcw](https://github.com/sivanzcw); Oct 17, 2019 + +## Table of Contents + +- [Queue State Management](#queue-state-management) + * [Table of Contents](#table-of-contents) + * [Motivation](#motivation) + * [Function Detail](#function-detail) + + [Data Structure](#data-structure) + + [Queue State](#queue-state) + + [Queue Lifecycle Management](#queue-lifecycle-management) + + [Queue Status Refreshment](#queue-status-refreshment) + + [Queue Placement Restriction](#queue-placement-restriction) + + [Queue State on The Scheduling Process](#queue-state-on-the-scheduling-process) + + [Queue State on `vcctl`](#queue-state-on--vcctl-) + +## Motivation + +The queue is an object of resource management in the cluster and the cornerstone of resource scheduling, which is +closely related to the allocation of resources and the scheduling of tasks. The resources under the cluster are +allocated according to the `weight` ratio of the queue. The configuration of queue guarantees the number of cluster +resources that tasks can use under the queue and limits the maximum resources that can be used. A single user or +user group is correspond to one or more queues, which is assigned and determined by the administrator. When queues +splitting cluster resources, single queue obtains the resource guarantees and quotas for using resources, so that uses +or user groups under the queue have opportunity to use cluster resources, Simultaneously due to the resource limitation +of queue, the ability of users or user groups to user cluster resources is limited to prevent cluster from being +overwhelmed by a single user to deliver a large number or tasks, thereby ensuring the `multi-tenancy` feature of +scheduling. When task is delivered, it will be placed to a specific queue and pod scheduling will by affected by queue +priority and queue resource status. It is worth mentioning that the resource allocation of queue and limitation of +queue resource can be dynamically adjusted. The queue can flexibly acquire remaining resources under cluster if there +are idle resources, when a queue is busy, and there are idle resources under the cluster, the queue may break the +original resource limit and try to occupy the remaining cluster resources. + +Based on the above description, it can be found that queue is a crucial object in the process of resource scheduling. +There should have a complete guarantee mechanism to ensure the stability of queue without losing the flexibility of +queue. Firstly, the queue should not be deleted arbitrarily, since if the queue is deleted, the unscheduled tasks in +the queue will not be scheduled normally and the resources occupied by running tasks in the queue will not be normally +counted. However, considering the flexibility of resource control, queue should not be forbidden to delete. In addition, +considering the decisive role of queue in resource management, the administrator will control which user or user group +can use cluster resources by controlling queue which also requires queue to provide corresponding capabilities. + +Therefore, we need to provide `State Management` capabilities for queue. Add the state configuration for queue and +adjust capabilities of queue by judging the state of queue, thereby achieving the management of queue lifecycle and +scheduling of tasks under the queue. + +## Function Detail + +### Data Structure + +Add `state` to `properties` in `spec` of CRD `queues.scheduling.sigs.dev`. The `state` of queue controller the status +of queue. + +```go +spec: + properties: + ... + + state: + type: string + + ... +``` + +Add `state` to `properties` in `status` of CRD `queues.scheduling.sigs.dev`. The `state` of queue display the status of +current queue. + +```go +status: + properties: + ... + + state: + type: string + + ... +``` +### Queue State + +Valid queue state includes: + +* `Open`, indicates that the queue is available, the queue receives new task delivery +* `Closed`, indicated that the queue is unavailable, the queue will wait for the subordinate tasks to gracefully exit, +which does not mean that the system will actively delete tasks under the queue. However, the queue does not receive new +task delivery +* `Closing`, is a intermediate state between `Open` and `Closed`. When the state of queue is `Open` and there +are tasks running or waiting to be scheduled under the queue. At this time, we try to change the state of queue to +`Closed`. The state of queue will changes to `Closing` firstly and then changes to `Closed` when all the tasks under +the queue exist. + +The ability of queue corresponding to queue state as show in the following table: + +| state | default | can be set | receive delivery | can be deleted | can be scheduled | deserved resources | +| :-------: | :-----: | :--------: | :--------------: | :------------: |:---------------: | :----------------: | +| `Open` | Y | Y | Y | N | Y | Normal | +| `Closed` | N | Y | N | Y | Y | Normal | +| `Closing` | N | N | N | N | Y | Normal | + +* If the state of queue is not specified during the creating of queue, the queue will use default state `Open` +* When creating a new queue, the user can only specify `Open` or `Closed` state for queue +* Only the queue with `Open` state accept new task delivery. the task will be rejected when it is posted to the queue +with `Closed` or `Closing` state +* Only the queue with `Closed` state can be deleted + +### Queue Lifecycle Management + +In the lifecycle management of queue, we need to guarantee the following three points: + +* When creating a new queue, if the user does not specify a state for queue, we need to specify default `Open` state +for it, If the user specifies a state for queue, the specified state must be a valid value, valid values are `Open` +and `Closed`. +* When upgrading the queue, if state of queue changed, the specified state value must be valid. +* when deleting the queue, only queue with `Closed` status can be deleted successfully. The `status` here is the `state` +under the status of queue, not the `state` under the `spec` of queue. +* `default` queue can not be deleted + +Add `validatingwebhookconfiguration` for queue validation during creating, updating or deleting of queue. + +```yaml +apiVersion: admissionregistration.k8s.io/v1beta1 +kind: ValidatingWebhookConfiguration +metadata: + name: {{ .Release.Name }}-validate-queue + annotations: + "helm.sh/hook": pre-install,pre-upgrade,post-delete +webhooks: + - clientConfig: + caBundle: "" + service: + name: {{ .Release.Name }}-admission-service + namespace: {{ .Release.Namespace }} + path: /queues + failurePolicy: Fail + name: validatequeue.volcano.sh + namespaceSelector: {} + rules: + - apiGroups: + - "scheduling.sigs.dev" + apiVersions: + - "v1alpha2" + operations: + - CREATE + - UPDATE + resources: + - queues +``` + +Add implementation function `AdmitQueues` + +```go +func AdmitQueues(ar v1beta1.AdmissionReview) *v1beta1.AdmissionResponse { + ... + queue, err := DecodeQueue(ar.Request.Object, ar.Request.Resource) + reviewResponse := v1beta1.AdmissionResponse{} + validateQueue(queue, &reviewResponse) + ... +} +``` + +The above function will complete the following verification: + +* During creating or upgrading queue, verify the validity of the queue state +* During deleting queue, check if queue can be deleted + +We need another `webhook` to set default state value for queue during queue creating, add `mutatingwebhookconfiguration` +and `MutateQueues` function + +```yaml +apiVersion: admissionregistration.k8s.io/v1beta1 +kind: MutatingWebhookConfiguration +metadata: + name: {{ .Release.Name }}-mutate-queue + annotations: + "helm.sh/hook": pre-install,pre-upgrade,post-delete +webhooks: + - clientConfig: + caBundle: "" + service: + name: {{ .Release.Name }}-admission-service + namespace: {{ .Release.Namespace }} + path: /mutating-queues + failurePolicy: Fail + name: mutatequeue.volcano.sh + namespaceSelector: {} + rules: + - apiGroups: + - "scheduling.sigs.dev" + apiVersions: + - "v1alpha2" + operations: + - CREATE + resources: + - queues +``` + +```go +func MutateQueues(ar v1beta1.AdmissionReview) *v1beta1.AdmissionResponse { + ... + queue, err := DecodeQueue(ar.Request.Object, ar.Request.Resource) + reviewResponse := v1beta1.AdmissionResponse{} + createPatch(queue) + ... +} +``` + +### Queue Status Refreshment + +When refreshing the status of queue, the `state` value under `spec.properties` and podgroup condition under the queue will be +considered: + +* If the `state` value is empty, the status of queue will be set as `Open` +* If the `state` value is `Open`, then the status of queue will also be `Open` +* If the `state` value is `Closed`, then we need to further consider whether there is a podgroup under the queue. if +there is a podgroup under the queue, the status of the queue will be set as `Closing`, while if there is no podgroup +under the queue, the status of queue will be set as `Closed`. + +### Queue Placement Restriction + +When creating job, we need to verify the status of queue specified by the job: + +* Allow job to be create, if the job does not specify a queue name +* If the job specifies a queue name and the status of the queue is `Open`, the job is allowed to create +* If the status of queue is not `Open`, the job creation request will be rejected. + +### Queue State on The Scheduling Process + +The above three states of queue have no effect on the existing scheduling process, for there is no pod under queue with +`Closed` state, while pods under queues with `Open` or `Closing` state should be scheduled normally. + +### Queue State on `vcctl` + +We need to add support for `queue state management` in `vcctl`, mainly including the following changes: + +* Support for passing state of queue when creating queue +* When getting queue detail or queue list, we need to display the status of the queue +* Provide update function of queue, the function supports updating the `weight` or `state` of queue +* Provide delete function of queue +* Add queue operation interface, add `queue open` `queue close` `queue update` support diff --git a/docs/design/queue.md b/docs/design/queue/queue.md similarity index 100% rename from docs/design/queue.md rename to docs/design/queue/queue.md