-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-49598][K8S] Support user-defined labels for OnDemand PVCs #48079
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for making a PR, @prathit06 .
The proposed pattern options.labels
is inconsistent from Apache Spark's way; spark.kubernetes.driver.label.something=true
and
spark.kubernetes.executor.label.something=true
. Please follow the standard way.
* @param labels labels in format : k1=v1,k2=v2 | ||
* @return Map[k1->v1, k2->v2] | ||
*/ | ||
private def convertStringLabelsToMap(labels: Option[String]): Map[String, String] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we don't need this helper.
Thanks for the review @dongjoon-hyun , will work on the provided suggestion & update here once the PR is ready for review again. |
Thank you. I'm preparing |
Hi @dongjoon-hyun i have updated the PR with suggested changes, Please re-review
|
Thank you for update, @prathit06 . I'll start a second-round review. |
docs/running-on-kubernetes.md
Outdated
@@ -1182,6 +1182,15 @@ See the [configuration page](configuration.html) for information on Spark config | |||
</td> | |||
<td>2.4.0</td> | |||
</tr> | |||
<tr> | |||
<td><code>spark.kubernetes.driver.volumes.label.[VolumeType].[VolumeName].[LabelName]</code></td> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well this looks wrong to me because label
could be interpreted as [VolumeType]
in the existing pattern.
Can we follow options
location like you did previously? I expected the following specifically
spark.kubernetes.driver.volumes.[VolumeType].[VolumeName].label.[LabelName]
docs/running-on-kubernetes.md
Outdated
<td>(none)</td> | ||
<td> | ||
Configure <a href="https://kubernetes.io/docs/concepts/storage/volumes/">Kubernetes Volume</a> labels passed to the Kubernetes with <code>LabelName</code> as key having specified value, must conform with Kubernetes label format. For example, | ||
<code>spark.kubernetes.driver.volumes.label.persistentVolumeClaim.checkpointpvc.foo=bar</code>. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update this too.
- spark.kubernetes.driver.volumes.label.persistentVolumeClaim.checkpointpvc.foo=bar
+ spark.kubernetes.driver.volumes.persistentVolumeClaim.checkpointpvc.label.foo=bar
docs/running-on-kubernetes.md
Outdated
@@ -1218,6 +1227,15 @@ See the [configuration page](configuration.html) for information on Spark config | |||
</td> | |||
<td>2.4.0</td> | |||
</tr> | |||
<tr> | |||
<td><code>spark.kubernetes.executor.volumes.label.[VolumeType].[VolumeName].[LabelName]</code></td> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto.
docs/running-on-kubernetes.md
Outdated
<td>(none)</td> | ||
<td> | ||
Configure <a href="https://kubernetes.io/docs/concepts/storage/volumes/">Kubernetes Volume</a> labels passed to the Kubernetes with <code>LabelName</code> as key having specified value, must conform with Kubernetes label format. For example, | ||
<code>spark.kubernetes.executor.volumes.label.persistentVolumeClaim.checkpointpvc.foo=bar</code>. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto.
@@ -59,6 +59,37 @@ class KubernetesVolumeUtilsSuite extends SparkFunSuite { | |||
KubernetesPVCVolumeConf("claimName")) | |||
} | |||
|
|||
test("Parses persistentVolumeClaim volumes correctly with labels") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a test prefix.
- test("Parses persistentVolumeClaim volumes correctly with labels") {
+ test("SPARK-49598: Parses persistentVolumeClaim volumes correctly with labels") {
labels = Map("env" -> "test", "foo" -> "bar"))) | ||
} | ||
|
||
test("Parses persistentVolumeClaim volumes & puts labels as empty Map if not provided") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a test prefix.
"", | ||
true, | ||
KubernetesPVCVolumeConf(MountVolumesFeatureStep.PVC_ON_DEMAND, | ||
labels = Map("foo" -> "bar", "env" -> "test"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although this is supported by language, we recommend to keep the parameter order. labels
is the last parameter.
@@ -86,12 +87,15 @@ private[spark] class MountVolumesFeatureStep(conf: KubernetesConf) | |||
.replaceAll(PVC_ON_DEMAND, s"${conf.resourceNamePrefix}-driver$PVC_POSTFIX-$i") | |||
} | |||
if (storageClass.isDefined && size.isDefined) { | |||
val volumeLabels = (labels ++ | |||
Map(SPARK_APP_ID_LABEL -> conf.appId)).asJava | |||
logDebug(s"Adding $volumeLabels to $claimName PVC ") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove this.
|
||
private[spark] class MountVolumesFeatureStep(conf: KubernetesConf) | ||
extends KubernetesFeatureConfigStep { | ||
extends KubernetesFeatureConfigStep with Logging { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove this after removing logDebug
.
"", | ||
true, | ||
KubernetesPVCVolumeConf(MountVolumesFeatureStep.PVC_ON_DEMAND, | ||
labels = Map("foo1" -> "bar1", "env" -> "exec-test"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto.
"", | ||
true, | ||
KubernetesPVCVolumeConf("pvcClaim1", | ||
labels = Map("foo1" -> "bar1", "env1" -> "exec-test-1"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto.
"", | ||
true, | ||
KubernetesPVCVolumeConf("pvcClaim2", | ||
labels = Map("foo2" -> "bar2", "env2" -> "exec-test-2"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto.
@@ -24,7 +24,8 @@ private[spark] case class KubernetesHostPathVolumeConf(hostPath: String) | |||
private[spark] case class KubernetesPVCVolumeConf( | |||
claimName: String, | |||
storageClass: Option[String] = None, | |||
size: Option[String] = None) | |||
size: Option[String] = None, | |||
labels: Map[String, String] = Map()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please follow the existing semantic of Option
.
- labels: Map[String, String] = Map())
+ labels: Option[Map[String, String]] = None)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I finished the second round review, @prathit06 .
Hi @dongjoon-hyun , i have updated the PR as per the suggestions, kindly re-review |
Thank you. I'm starting a review now. |
@@ -45,13 +45,21 @@ object KubernetesVolumeUtils { | |||
val pathKey = s"$volumeType.$volumeName.$KUBERNETES_VOLUMES_MOUNT_PATH_KEY" | |||
val readOnlyKey = s"$volumeType.$volumeName.$KUBERNETES_VOLUMES_MOUNT_READONLY_KEY" | |||
val subPathKey = s"$volumeType.$volumeName.$KUBERNETES_VOLUMES_MOUNT_SUBPATH_KEY" | |||
val labelsKey = s"$volumeType.$volumeName.label." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Change the variable name from
labelsKey
tolabelKey
to match the plural. - Define and use
KUBERNETES_VOLUMES_LABEL_KEY
atConfig.scala
like
spark/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
Line 773 in 98f0d9f
val KUBERNETES_VOLUMES_OPTIONS_PATH_KEY = "options.path" |
@@ -45,13 +45,21 @@ object KubernetesVolumeUtils { | |||
val pathKey = s"$volumeType.$volumeName.$KUBERNETES_VOLUMES_MOUNT_PATH_KEY" | |||
val readOnlyKey = s"$volumeType.$volumeName.$KUBERNETES_VOLUMES_MOUNT_READONLY_KEY" | |||
val subPathKey = s"$volumeType.$volumeName.$KUBERNETES_VOLUMES_MOUNT_SUBPATH_KEY" | |||
val labelsKey = s"$volumeType.$volumeName.label." | |||
|
|||
val volumeSpecificLabelsMap = properties |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
volumeSpecificLabelsMap
-> volumeLabelsMap
@@ -142,6 +147,7 @@ object KubernetesTestConf { | |||
} | |||
conf.set(key(vtype, spec.volumeName, KUBERNETES_VOLUMES_MOUNT_READONLY_KEY), | |||
spec.mountReadOnly.toString) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove this.
@@ -120,7 +120,7 @@ class MountVolumesFeatureStepSuite extends SparkFunSuite { | |||
"/tmp", | |||
"", | |||
true, | |||
KubernetesPVCVolumeConf("OnDemand") | |||
KubernetesPVCVolumeConf(MountVolumesFeatureStep.PVC_ON_DEMAND) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't touch irrelevant line.
"", | ||
true, | ||
KubernetesPVCVolumeConf(claimName = MountVolumesFeatureStep.PVC_ON_DEMAND, | ||
storageClass = Some("gp"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit. Let's use the latest storage class; "gp" -> "gp3"
"", | ||
true, | ||
KubernetesPVCVolumeConf(claimName = MountVolumesFeatureStep.PVC_ON_DEMAND, | ||
storageClass = Some("gp"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gp
-> gp3
"", | ||
true, | ||
KubernetesPVCVolumeConf(claimName = "pvcClaim1", | ||
storageClass = Some("gp"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto.
"", | ||
true, | ||
KubernetesPVCVolumeConf(claimName = "pvcClaim2", | ||
storageClass = Some("gp"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, it looks good to me. Please address some comments, @prathit06 .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM (Pending CIs). Thank you again, @prathit06 .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you fix the test case failure, @prathit06 ?
[info] - Parses persistentVolumeClaim volumes correctly *** FAILED *** (29 milliseconds)
[info] KubernetesPVCVolumeConf("claimName", None, None, Some(Map())) did not equal KubernetesPVCVolumeConf("claimName", None, None, None) (KubernetesVolumeUtilsSuite.scala:58)
[
The remaining failure is your forked repository setting issue.
|
Let me merge this since this passes all K8s-related tests. |
Merged to master for Apache Spark 4.0.0-preview2. Welcome to the Apache Spark community, @prathit06 . |
Thanks @dongjoon-hyun ,for your continuous support and guidance throughout the PR review. |
### What changes were proposed in this pull request? Currently when user sets `volumes.persistentVolumeClaim.[VolumeName].options.claimName=OnDemand` PVCs are created with only 1 label i.e. spark-app-selector = spark.app.id. Objective of this PR is to allow support of custom labels for onDemand PVCs ### Why are the changes needed? Changes are needed so users can set custom labels to PVCs ### Does this PR introduce _any_ user-facing change? It does not break any existing behaviour but adds a new feature/improvement to enable custom label additions in ondemand PVCs ### How was this patch tested? This was tested in internal/production k8 cluster ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#48079 from prathit06/ondemand-pvc-labels. Lead-authored-by: prathit06 <malik.prathit@gmail.com> Co-authored-by: Prathit malik <53890994+prathit06@users.noreply.github.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request? Currently when user sets `volumes.persistentVolumeClaim.[VolumeName].options.claimName=OnDemand` PVCs are created with only 1 label i.e. spark-app-selector = spark.app.id. Objective of this PR is to allow support of custom labels for onDemand PVCs ### Why are the changes needed? Changes are needed so users can set custom labels to PVCs ### Does this PR introduce _any_ user-facing change? It does not break any existing behaviour but adds a new feature/improvement to enable custom label additions in ondemand PVCs ### How was this patch tested? This was tested in internal/production k8 cluster ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#48079 from prathit06/ondemand-pvc-labels. Lead-authored-by: prathit06 <malik.prathit@gmail.com> Co-authored-by: Prathit malik <53890994+prathit06@users.noreply.github.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
What changes were proposed in this pull request?
Currently when user sets
volumes.persistentVolumeClaim.[VolumeName].options.claimName=OnDemand
PVCs are created with only 1 label i.e. spark-app-selector = spark.app.id.
Objective of this PR is to allow support of custom labels for onDemand PVCs
Why are the changes needed?
Changes are needed so users can set custom labels to PVCs
Does this PR introduce any user-facing change?
It does not break any existing behaviour but adds a new feature/improvement to enable custom label additions in ondemand PVCs
How was this patch tested?
This was tested in internal/production k8 cluster
Was this patch authored or co-authored using generative AI tooling?
No