When users apply jobs to volcano, they may need adding some particular constraints to job, for example, longest Pending time aiming to prevent job from starving. And these constraints can be regarded as Service Level Agreement (SLA) which are agreed between volcano and user. So sla plugin is provided to receive and realize SLA settings for both individual job and whole cluster.
-
In sla plugin, arguments
sla-waiting-time
is provided to realize job resource reservation:sla-waiting-time
is maximum time one job should stayPending
orinqueue
status and not be allocated. Whensla-waiting-time
is over,sla
plugin sets the job to beinqueue
inenqueue
action immediately. Thensla
plugin locks idle resources pre-allocated to pods of this job inallocate
action, even if the job has not beenReady
yet. In this way,sla
plugin realizes large job election and resource reservation, thus replaceselect
&reserve
action in v1.1.0. -
Arguments
sla-waiting-time
can be set for one job, and for all jobs in cluster.-
For one job, user can set them in job annotations in following format:
apiVersion: batch.volcano.sh/v1alpha1 kind: Job metadata: annotations: sla-waiting-time: 1h2m3s
-
For all jobs, user can set
sla-waiting-time
field insla
plugin arguments viavolcano-scheduler-configmap
in following format:actions: "enqueue, allocate, backfill" tiers: - plugins: - name: priority - name: gang - name: sla arguments: sla-waiting-time: 1h2m3s
-
-
sla
plugin return 3 callback functions:JobEnqueueableFn
,JobPipelinedFn
, andJobOrderFn
:-
JobEnqueueableFn
returnsPermit
when job waiting time inPending
status is longer thansla-waiting-time
, and job will go throughenqueue
action and beinqueue
instantly, regardless of other plugins returningReject
orAbstain
to reject this job from beinginqueue
. -
JobPipelinedFn
returnsPermit
when job waiting time ininqueue
status is longer thansla-waiting-time
, and job will bePipelined
status instantly, regardless of other plugins returningReject
orAbstain
to reject this job from beingPipelined
. In this wayallocate
action reserves resources for pods of the job even if the job is not Ready yet. -
JobOrderFn
adjusts the order of this job in waiting queues ofenqueue
&allocate
action. The more close tosla-waiting-time
that job waiting time is, the higher scored of this job inJobOrderFn
ofsla
plugin, so that job would have larger probability to be front int priority queue, which means that it can touch more idle resources and have higher priority to beinqueue
and allocated.
-
- By now we only need 1 argument
sla-waiting-time
, so I add it into annotations for simplicity and invocation, but whensla
plugin is extended with more arguments, a better way to invoke this plugin may be job plugin likesvc
andssh
.