-
Notifications
You must be signed in to change notification settings - Fork 993
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added Queue design doc. #95
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,113 @@ | ||
# Queue | ||
|
||
[@k82cn](http://github.com/k82cn); April 17, 2019 | ||
|
||
## Motivation | ||
|
||
`Queue` was introduced in [kube-batch](http://github.com/kubernetes-sigs/kube-batch) long time ago as an internal feature, which makes all jobs are submitted to the same queue, named `default`. As more and more users would like to share resources with each other by queue, this proposal is going to cover primary features of queue achieve that. | ||
|
||
## Function Specification | ||
|
||
The queue is cluster level, so the user from different namespaces can share resource within a `Queue`. The following section defines the api of queue. | ||
|
||
### API | ||
|
||
```go | ||
type Queue struct { | ||
metav1.TypeMeta `json:",inline"` | ||
|
||
metav1.ObjectMeta `json:"metadata,omitempty" protobuf:"bytes,1,opt,name=metadata"` | ||
|
||
// Specification of the desired behavior of a queue | ||
// +optional | ||
Spec QueueSpec `json:"spec,omitempty" protobuf:"bytes,2,opt,name=spec"` | ||
|
||
// Current status of Queue | ||
// +optional | ||
Status QueueStatus `json:"status,omitempty" protobuf:"bytes,3,opt,name=status"` | ||
} | ||
|
||
type QueueSpec struct { | ||
// The weight of queue to share the resources with each other. | ||
Weight int32 `json:"weight,omitempty" protobuf:"bytes,1,opt,name=weight"` | ||
} | ||
|
||
type QueueStatus struct { | ||
// The number of job in Unknown status | ||
Unknown int32 `json:"running,omitempty" protobuf:"bytes,1,opt,name=running"` | ||
// The number of job in Running status | ||
Running int32 `json:"running,omitempty" protobuf:"bytes,2,opt,name=running"` | ||
// The number of job in Pending status | ||
Pending int32 `json:"pending,omitempty" protobuf:"bytes,3,opt,name=pending"` | ||
// The number of job in Completed status | ||
Completed int32 `json:"completed,omitempty" protobuf:"bytes,4,opt,name=completed"` | ||
// The number of job in Failed status | ||
Failed int32 `json:"failed,omitempty" protobuf:"bytes,5,opt,name=failed"` | ||
// The number of job in Aborted status | ||
Aborted int32 `json:"aborted,omitempty" protobuf:"bytes,6,opt,name=aborted"` | ||
k82cn marked this conversation as resolved.
Show resolved
Hide resolved
k82cn marked this conversation as resolved.
Show resolved
Hide resolved
|
||
} | ||
``` | ||
|
||
### QueueController | ||
|
||
The `QueueController` will manage the lifecycle of queue: | ||
|
||
1. Watching `PodGroup`/`Job` for status | ||
2. If `Queue` was deleted, also delete all related `PodGroup`/`Job` in the queue | ||
|
||
### Admission Controller | ||
|
||
The admission controller will check `PodGroup`/`Job` 's queue when creation: | ||
|
||
1. if the queue does not exist, the creation will be rejected | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we inject the default queue? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. maybe reject it in the first version; we can refer to how prioirtyCass for such kind of default value. |
||
2. if the queue is releasing, the creation will be also rejected | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How about the default weight in admission hook |
||
|
||
### Feature Interaction | ||
|
||
#### Customized Job/PodGroup | ||
|
||
If the `PodGroup` is created by customized controller, the `QueueController` will count those `PodGroup` into `Unknown` status; because `PodGroup` focus on scheduling specification which did not include customized job's status. | ||
|
||
#### cli | ||
|
||
Command line is also enhanced for operator engineers. Three sub-commands are introduced as follow: | ||
|
||
__create__: | ||
|
||
`create` command is used to create a queue with weight; for example, the following command will create a queue named `myqueue` with weight 10. | ||
|
||
```shell | ||
$ vkctl queue create --name myqueue --weight 10 | ||
``` | ||
|
||
__view__: | ||
|
||
`view` command is used to show the detail of a queue, e.g. creation time; the following command will show the detail of queue `myqueue` | ||
|
||
```shell | ||
$ vkctl queue view myqueue | ||
``` | ||
|
||
__list__: | ||
|
||
`list` command is used to show all available queues to current user | ||
|
||
```shell | ||
$ vkctl queue list | ||
Name Weight Total Pending Running ... | ||
myqueue 10 10 5 5 | ||
``` | ||
|
||
#### Scheduler | ||
|
||
* Proportion plugin: | ||
|
||
Proportion plugin is used to share resource between `Queue`s by weight. The deserved resource of a queue is `(weight/total-weight) * total-resource`. When allocating resources, it will not allocate resource more than its deserved resources. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. IIUC, it seems There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Another case, cluster resources may also change There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Kube-batch will continue executing those actions every X period. |
||
|
||
* Reclaim action: | ||
|
||
`reclaim` action will go through all queues to reclaim others by `ReclaimableFn`'s return value; the time complexity is `O(n^2)`. In `ReclaimableFn`, both `proportion` and `gang` will take effect: 1. `proportion` makes sure the queue will not be under-used after reclaim, 2. `gang` makes sure the job will not be reclaimed if its `minAvailable` > 1. | ||
|
||
* Backfill action: | ||
|
||
When `allocate` action assign resources to each queue, there's a case that ([kube-batch#492](<https://github.com/kubernetes-sigs/kube-batch/issues/492>)) the resources maybe unnecessary idle because of `proportion` plugin: there are one pending job in two queue each, and the deserved resources of each queue can not meet the requirement of their jobs. In such case, `backfill` action will ignore deserved guarantee of queue to fill idle resources as much as possible. This introduces another potential case that the coming smaller job is blocked; this case will be handle by reserved resources of each queue in other project. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the meaning of the value? Should have a clear desc.