Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support driver and executor pod use different priority #2146

Merged
merged 15 commits into from
Sep 10, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions api/v1beta2/sparkapplication_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -273,6 +273,9 @@ type BatchSchedulerConfiguration struct {
// If specified, volcano scheduler will consider it as the resources requested.
// +optional
Resources corev1.ResourceList `json:"resources,omitempty"`
// PriorityClassSplitting stands for the splitting of the priority class, it's being used driver and executor pods
// +optional
PriorityClassSplitting *bool `json:"priorityClassSplitting,omitempty"`
}

// SparkUIConfiguration is for driver UI specific configuration parameters.
Expand Down Expand Up @@ -536,6 +539,9 @@ type DriverSpec struct {
// Ports settings for the pods, following the Kubernetes specifications.
// +optional
Ports []Port `json:"ports,omitempty"`
// PriorityClassName is the name of the PriorityClass for the driver pod.
// +optional
PriorityClassName *string `json:"priorityClassName,omitempty"`
}

// ExecutorSpec is specification of the executor.
Expand Down Expand Up @@ -563,6 +569,9 @@ type ExecutorSpec struct {
// Ports settings for the pods, following the Kubernetes specifications.
// +optional
Ports []Port `json:"ports,omitempty"`
// PriorityClassName is the name of the PriorityClass for the executor pod.
// +optional
PriorityClassName *string `json:"priorityClassName,omitempty"`
}

// NamePath is a pair of a name and a path to which the named objects should be mounted to.
Expand Down
15 changes: 15 additions & 0 deletions api/v1beta2/zz_generated.deepcopy.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,11 @@ spec:
PriorityClass resource, it's being used in Volcano batch
scheduler.
type: string
priorityClassSplitting:
description: PriorityClassSplitting stands for the splitting
of the priority class, it's being used driver and executor
pods
type: boolean
queue:
description: Queue stands for the resource queue which the
application belongs to, it's being used in Volcano batch
Expand Down Expand Up @@ -3179,6 +3184,10 @@ spec:
- protocol
type: object
type: array
priorityClassName:
description: PriorityClassName is the name of the PriorityClass
for the driver pod.
type: string
schedulerName:
description: SchedulerName specifies the scheduler that will
be used for scheduling
Expand Down Expand Up @@ -7946,6 +7955,10 @@ spec:
- protocol
type: object
type: array
priorityClassName:
description: PriorityClassName is the name of the PriorityClass
for the executor pod.
type: string
schedulerName:
description: SchedulerName specifies the scheduler that will
be used for scheduling
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,10 @@ spec:
description: PriorityClassName stands for the name of k8s PriorityClass
resource, it's being used in Volcano batch scheduler.
type: string
priorityClassSplitting:
description: PriorityClassSplitting stands for the splitting of
the priority class, it's being used driver and executor pods
type: boolean
queue:
description: Queue stands for the resource queue which the application
belongs to, it's being used in Volcano batch scheduler.
Expand Down Expand Up @@ -3127,6 +3131,10 @@ spec:
- protocol
type: object
type: array
priorityClassName:
description: PriorityClassName is the name of the PriorityClass
for the driver pod.
type: string
schedulerName:
description: SchedulerName specifies the scheduler that will be
used for scheduling
Expand Down Expand Up @@ -7864,6 +7872,10 @@ spec:
- protocol
type: object
type: array
priorityClassName:
description: PriorityClassName is the name of the PriorityClass
for the executor pod.
type: string
schedulerName:
description: SchedulerName specifies the scheduler that will be
used for scheduling
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,11 @@ spec:
PriorityClass resource, it's being used in Volcano batch
scheduler.
type: string
priorityClassSplitting:
description: PriorityClassSplitting stands for the splitting
of the priority class, it's being used driver and executor
pods
type: boolean
queue:
description: Queue stands for the resource queue which the
application belongs to, it's being used in Volcano batch
Expand Down Expand Up @@ -3179,6 +3184,10 @@ spec:
- protocol
type: object
type: array
priorityClassName:
description: PriorityClassName is the name of the PriorityClass
for the driver pod.
type: string
schedulerName:
description: SchedulerName specifies the scheduler that will
be used for scheduling
Expand Down Expand Up @@ -7946,6 +7955,10 @@ spec:
- protocol
type: object
type: array
priorityClassName:
description: PriorityClassName is the name of the PriorityClass
for the executor pod.
type: string
schedulerName:
description: SchedulerName specifies the scheduler that will
be used for scheduling
Expand Down
12 changes: 12 additions & 0 deletions config/crd/bases/sparkoperator.k8s.io_sparkapplications.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,10 @@ spec:
description: PriorityClassName stands for the name of k8s PriorityClass
resource, it's being used in Volcano batch scheduler.
type: string
priorityClassSplitting:
description: PriorityClassSplitting stands for the splitting of
the priority class, it's being used driver and executor pods
type: boolean
queue:
description: Queue stands for the resource queue which the application
belongs to, it's being used in Volcano batch scheduler.
Expand Down Expand Up @@ -3127,6 +3131,10 @@ spec:
- protocol
type: object
type: array
priorityClassName:
description: PriorityClassName is the name of the PriorityClass
for the driver pod.
type: string
schedulerName:
description: SchedulerName specifies the scheduler that will be
used for scheduling
Expand Down Expand Up @@ -7864,6 +7872,10 @@ spec:
- protocol
type: object
type: array
priorityClassName:
description: PriorityClassName is the name of the PriorityClass
for the executor pod.
type: string
schedulerName:
description: SchedulerName specifies the scheduler that will be
used for scheduling
Expand Down
38 changes: 32 additions & 6 deletions internal/webhook/sparkpod_defaulter.go
Original file line number Diff line number Diff line change
Expand Up @@ -475,12 +475,7 @@ func addSchedulerName(pod *corev1.Pod, app *v1beta2.SparkApplication) error {
return nil
}

func addPriorityClassName(pod *corev1.Pod, app *v1beta2.SparkApplication) error {
var priorityClassName *string
if app.Spec.BatchSchedulerOptions != nil {
priorityClassName = app.Spec.BatchSchedulerOptions.PriorityClassName
}

func podPriorityClassAssign(pod *corev1.Pod, priorityClassName *string) {
if priorityClassName != nil && *priorityClassName != "" {
pod.Spec.PriorityClassName = *priorityClassName
if pod.Spec.Priority != nil {
Expand All @@ -490,9 +485,40 @@ func addPriorityClassName(pod *corev1.Pod, app *v1beta2.SparkApplication) error
pod.Spec.PreemptionPolicy = nil
}
}
}

func addDriverExecutorSelfPriorityClassName(pod *corev1.Pod, app *v1beta2.SparkApplication) error {
var priorityClassName *string
if util.IsDriverPod(pod) {
priorityClassName = app.Spec.Driver.PriorityClassName
}
if util.IsExecutorPod(pod) {
priorityClassName = app.Spec.Executor.PriorityClassName
}

podPriorityClassAssign(pod, priorityClassName)

return nil
}

func addDriverExecutorUnifyPriorityClassName(pod *corev1.Pod, app *v1beta2.SparkApplication) error {
var priorityClassName *string
if app.Spec.BatchSchedulerOptions != nil {
priorityClassName = app.Spec.BatchSchedulerOptions.PriorityClassName
}

podPriorityClassAssign(pod, priorityClassName)

return nil
}
func addPriorityClassName(pod *corev1.Pod, app *v1beta2.SparkApplication) error {
if *app.Spec.BatchSchedulerOptions.PriorityClassSplitting {
return addDriverExecutorSelfPriorityClassName(pod, app)
} else {
return addDriverExecutorUnifyPriorityClassName(pod, app)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my understanding, the priorityClassName in batchSchedulerOptions is utilized by Volcano's PodGroup, and it is unrelated to the priorityClass of Spark pods. Given this, it seems unnecessary to have the prioryClassSplitting option or the addDriverExecutorUnifyPriorityClassName function. WDYT?

Copy link
Contributor Author

@Kevinz857 Kevinz857 Aug 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK,prioryClassSplitting batchSchedulerOptions may be a bit inappropriate, let me clarify my usage scenario

The current business scenario has not yet used the ability of podGroup. The underlying K8s cluster is divided into multiple resource pools, and the SparkApplication task itself is defined with different priorities at the business level. Moreover, the resources of the underlying K8s cluster are limited. At this time, it is expected that high-priority executors can run first when resources are insufficient, so the PriorityClass priority preemption scheduling mechanism of K8s itself is used

But because the default PriorityClass of SparkApplication is in the batchSchedulerOptions, it means that the driver and executor under the same SparkApplication will be set to the same priority; but in the actual business scenario, there may be problems, such as we know that the driver is the controller of the executor, once the drvier node is preempted and expelled, the entire SparkApplication will fail, and the expected effect of the business is that the driver pod cannot be preempted with high priority, and the executor can be preempted with high priority, so that different priorities must be set for the driver and executor, such as setting the highest priority for the driver, and the executor can have a low priority, so that after the low-priority executor is expelled, it will wait for high priority After the level executor is executed and there are free resources, the executors of these previously expelled low-priority tasks will be created by the corresponding drvier pod

If this is the case, how should this priority be defined?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Kevinz857 Perhaps we can do as follows:

  • If spec.batchSchedulerOptions.priorityClassName is defined, it will automatically be propagated to both driver and executor pods by default, as this is the current behavior so that we will not make a breaking change.
  • Furthermore, if either spec.driver.priorityClassName or spec.executor.priorityClassName is specifically defined, it should take precedence over spec.batchSchedulerOptions.priorityClassName. In this way, we can eliminate the need to introduce spec.batchSchedulerOptions.prioryClassSplitting option as it is somewhat confusing. Also, users can define different priority classes for driver and executor pods.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ChenYi015 Very good suggestion, I can modify the code and resubmit MR according to this proposal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ChenYi015 According to the latest proposal, if if either spec.driver.priorityClassName or spec.executor.priorityClassName is specifically defined, it should take precedence over spec.batchSchedulerOptions.priorityClassName.

Modified the code according to this proposal, please help review it when you have time, thanks


func addPodSecurityContext(pod *corev1.Pod, app *v1beta2.SparkApplication) error {
var securityContext *corev1.PodSecurityContext
if util.IsDriverPod(pod) {
Expand Down