Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-49598][K8S] Support user-defined labels for OnDemand PVCs #48079

Closed
wants to merge 10 commits into from

Conversation

prathit06
Copy link
Contributor

@prathit06 prathit06 commented Sep 11, 2024

What changes were proposed in this pull request?

Currently when user sets
volumes.persistentVolumeClaim.[VolumeName].options.claimName=OnDemand
PVCs are created with only 1 label i.e. spark-app-selector = spark.app.id.
Objective of this PR is to allow support of custom labels for onDemand PVCs

Why are the changes needed?

Changes are needed so users can set custom labels to PVCs

Does this PR introduce any user-facing change?

It does not break any existing behaviour but adds a new feature/improvement to enable custom label additions in ondemand PVCs

How was this patch tested?

This was tested in internal/production k8 cluster

Was this patch authored or co-authored using generative AI tooling?

No

@prathit06 prathit06 changed the title Ondemand pvc labels [SPARK-49598] Added support to create custom user-defined labels for onDemand PVCs Sep 11, 2024
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for making a PR, @prathit06 .

The proposed pattern options.labels is inconsistent from Apache Spark's way; spark.kubernetes.driver.label.something=true and
spark.kubernetes.executor.label.something=true. Please follow the standard way.

* @param labels labels in format : k1=v1,k2=v2
* @return Map[k1->v1, k2->v2]
*/
private def convertStringLabelsToMap(labels: Option[String]): Map[String, String] = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we don't need this helper.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-49598] Added support to create custom user-defined labels for onDemand PVCs [SPARK-49598] Support user-defined labels for OnDemand PVCs Sep 11, 2024
@dongjoon-hyun dongjoon-hyun changed the title [SPARK-49598] Support user-defined labels for OnDemand PVCs [SPARK-49598][K8S] Support user-defined labels for OnDemand PVCs Sep 11, 2024
@prathit06
Copy link
Contributor Author

Thank you for making a PR, @prathit06 .

The proposed pattern options.labels is inconsistent from Apache Spark's way; spark.kubernetes.driver.label.something=true and spark.kubernetes.executor.label.something=true. Please follow the standard way.

Thanks for the review @dongjoon-hyun , will work on the provided suggestion & update here once the PR is ready for review again.

@dongjoon-hyun
Copy link
Member

Thank you.

I'm preparing Apache Spark 4.0.0-preview2 RC1 for next Monday. It would be great if we can ship your contribution together.

@github-actions github-actions bot added the DOCS label Sep 12, 2024
@prathit06 prathit06 changed the title [SPARK-49598][K8S] Support user-defined labels for OnDemand PVCs [WIP][SPARK-49598][K8S] Support user-defined labels for OnDemand PVCs Sep 12, 2024
@prathit06 prathit06 changed the title [WIP][SPARK-49598][K8S] Support user-defined labels for OnDemand PVCs [SPARK-49598][K8S] Support user-defined labels for OnDemand PVCs Sep 12, 2024
@prathit06
Copy link
Contributor Author

prathit06 commented Sep 12, 2024

Hi @dongjoon-hyun i have updated the PR with suggested changes, Please re-review
adding for reference :
following the config standard way, labels for volume is being picked in below format in this PR

spark.kubernetes.driver.volumes.label.[VolumeType].[VolumeName].[LabelName]: ["LabelValue"]

@dongjoon-hyun
Copy link
Member

Thank you for update, @prathit06 . I'll start a second-round review.

@@ -1182,6 +1182,15 @@ See the [configuration page](configuration.html) for information on Spark config
</td>
<td>2.4.0</td>
</tr>
<tr>
<td><code>spark.kubernetes.driver.volumes.label.[VolumeType].[VolumeName].[LabelName]</code></td>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well this looks wrong to me because label could be interpreted as [VolumeType] in the existing pattern.

Can we follow options location like you did previously? I expected the following specifically

spark.kubernetes.driver.volumes.[VolumeType].[VolumeName].label.[LabelName]

<td>(none)</td>
<td>
Configure <a href="https://kubernetes.io/docs/concepts/storage/volumes/">Kubernetes Volume</a> labels passed to the Kubernetes with <code>LabelName</code> as key having specified value, must conform with Kubernetes label format. For example,
<code>spark.kubernetes.driver.volumes.label.persistentVolumeClaim.checkpointpvc.foo=bar</code>.
Copy link
Member

@dongjoon-hyun dongjoon-hyun Sep 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update this too.

- spark.kubernetes.driver.volumes.label.persistentVolumeClaim.checkpointpvc.foo=bar
+ spark.kubernetes.driver.volumes.persistentVolumeClaim.checkpointpvc.label.foo=bar

@@ -1218,6 +1227,15 @@ See the [configuration page](configuration.html) for information on Spark config
</td>
<td>2.4.0</td>
</tr>
<tr>
<td><code>spark.kubernetes.executor.volumes.label.[VolumeType].[VolumeName].[LabelName]</code></td>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.

<td>(none)</td>
<td>
Configure <a href="https://kubernetes.io/docs/concepts/storage/volumes/">Kubernetes Volume</a> labels passed to the Kubernetes with <code>LabelName</code> as key having specified value, must conform with Kubernetes label format. For example,
<code>spark.kubernetes.executor.volumes.label.persistentVolumeClaim.checkpointpvc.foo=bar</code>.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.

@@ -59,6 +59,37 @@ class KubernetesVolumeUtilsSuite extends SparkFunSuite {
KubernetesPVCVolumeConf("claimName"))
}

test("Parses persistentVolumeClaim volumes correctly with labels") {
Copy link
Member

@dongjoon-hyun dongjoon-hyun Sep 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a test prefix.

- test("Parses persistentVolumeClaim volumes correctly with labels") {
+ test("SPARK-49598: Parses persistentVolumeClaim volumes correctly with labels") {

labels = Map("env" -> "test", "foo" -> "bar")))
}

test("Parses persistentVolumeClaim volumes & puts labels as empty Map if not provided") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a test prefix.

"",
true,
KubernetesPVCVolumeConf(MountVolumesFeatureStep.PVC_ON_DEMAND,
labels = Map("foo" -> "bar", "env" -> "test"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although this is supported by language, we recommend to keep the parameter order. labels is the last parameter.

@@ -86,12 +87,15 @@ private[spark] class MountVolumesFeatureStep(conf: KubernetesConf)
.replaceAll(PVC_ON_DEMAND, s"${conf.resourceNamePrefix}-driver$PVC_POSTFIX-$i")
}
if (storageClass.isDefined && size.isDefined) {
val volumeLabels = (labels ++
Map(SPARK_APP_ID_LABEL -> conf.appId)).asJava
logDebug(s"Adding $volumeLabels to $claimName PVC ")
Copy link
Member

@dongjoon-hyun dongjoon-hyun Sep 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove this.


private[spark] class MountVolumesFeatureStep(conf: KubernetesConf)
extends KubernetesFeatureConfigStep {
extends KubernetesFeatureConfigStep with Logging {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove this after removing logDebug.

"",
true,
KubernetesPVCVolumeConf(MountVolumesFeatureStep.PVC_ON_DEMAND,
labels = Map("foo1" -> "bar1", "env" -> "exec-test"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.

"",
true,
KubernetesPVCVolumeConf("pvcClaim1",
labels = Map("foo1" -> "bar1", "env1" -> "exec-test-1"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.

"",
true,
KubernetesPVCVolumeConf("pvcClaim2",
labels = Map("foo2" -> "bar2", "env2" -> "exec-test-2"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.

@@ -24,7 +24,8 @@ private[spark] case class KubernetesHostPathVolumeConf(hostPath: String)
private[spark] case class KubernetesPVCVolumeConf(
claimName: String,
storageClass: Option[String] = None,
size: Option[String] = None)
size: Option[String] = None,
labels: Map[String, String] = Map())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please follow the existing semantic of Option.

- labels: Map[String, String] = Map())
+ labels: Option[Map[String, String]] = None)

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I finished the second round review, @prathit06 .

@prathit06
Copy link
Contributor Author

I finished the second round review, @prathit06 .

Hi @dongjoon-hyun , i have updated the PR as per the suggestions, kindly re-review

@dongjoon-hyun
Copy link
Member

Thank you. I'm starting a review now.

@@ -45,13 +45,21 @@ object KubernetesVolumeUtils {
val pathKey = s"$volumeType.$volumeName.$KUBERNETES_VOLUMES_MOUNT_PATH_KEY"
val readOnlyKey = s"$volumeType.$volumeName.$KUBERNETES_VOLUMES_MOUNT_READONLY_KEY"
val subPathKey = s"$volumeType.$volumeName.$KUBERNETES_VOLUMES_MOUNT_SUBPATH_KEY"
val labelsKey = s"$volumeType.$volumeName.label."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Change the variable name from labelsKey to labelKey to match the plural.
  • Define and use KUBERNETES_VOLUMES_LABEL_KEY at Config.scala like

val KUBERNETES_VOLUMES_OPTIONS_PATH_KEY = "options.path"

@@ -45,13 +45,21 @@ object KubernetesVolumeUtils {
val pathKey = s"$volumeType.$volumeName.$KUBERNETES_VOLUMES_MOUNT_PATH_KEY"
val readOnlyKey = s"$volumeType.$volumeName.$KUBERNETES_VOLUMES_MOUNT_READONLY_KEY"
val subPathKey = s"$volumeType.$volumeName.$KUBERNETES_VOLUMES_MOUNT_SUBPATH_KEY"
val labelsKey = s"$volumeType.$volumeName.label."

val volumeSpecificLabelsMap = properties
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

volumeSpecificLabelsMap -> volumeLabelsMap

@@ -142,6 +147,7 @@ object KubernetesTestConf {
}
conf.set(key(vtype, spec.volumeName, KUBERNETES_VOLUMES_MOUNT_READONLY_KEY),
spec.mountReadOnly.toString)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this.

@@ -120,7 +120,7 @@ class MountVolumesFeatureStepSuite extends SparkFunSuite {
"/tmp",
"",
true,
KubernetesPVCVolumeConf("OnDemand")
KubernetesPVCVolumeConf(MountVolumesFeatureStep.PVC_ON_DEMAND)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't touch irrelevant line.

"",
true,
KubernetesPVCVolumeConf(claimName = MountVolumesFeatureStep.PVC_ON_DEMAND,
storageClass = Some("gp"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit. Let's use the latest storage class; "gp" -> "gp3"

"",
true,
KubernetesPVCVolumeConf(claimName = MountVolumesFeatureStep.PVC_ON_DEMAND,
storageClass = Some("gp"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gp -> gp3

"",
true,
KubernetesPVCVolumeConf(claimName = "pvcClaim1",
storageClass = Some("gp"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.

"",
true,
KubernetesPVCVolumeConf(claimName = "pvcClaim2",
storageClass = Some("gp"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, it looks good to me. Please address some comments, @prathit06 .

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM (Pending CIs). Thank you again, @prathit06 .

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you fix the test case failure, @prathit06 ?

[info] - Parses persistentVolumeClaim volumes correctly *** FAILED *** (29 milliseconds)
[info]   KubernetesPVCVolumeConf("claimName", None, None, Some(Map())) did not equal KubernetesPVCVolumeConf("claimName", None, None, None) (KubernetesVolumeUtilsSuite.scala:58)
[

@dongjoon-hyun
Copy link
Member

The remaining failure is your forked repository setting issue.

ERROR: failed to solve: failed to push ghcr.io/prathit06/apache-spark-ci-image:master-10836346141: unexpected status from POST request to https://ghcr.io/v2/prathit06/apache-spark-ci-image/blobs/uploads/: 403 Forbidden

@dongjoon-hyun
Copy link
Member

Let me merge this since this passes all K8s-related tests.

@dongjoon-hyun
Copy link
Member

Merged to master for Apache Spark 4.0.0-preview2.

Welcome to the Apache Spark community, @prathit06 .
I added you to the Apache Spark contributor group and assigned SPARK-49598 to you.
And, congratulations for your first commit.

@prathit06
Copy link
Contributor Author

Thanks @dongjoon-hyun ,for your continuous support and guidance throughout the PR review.

attilapiros pushed a commit to attilapiros/spark that referenced this pull request Oct 4, 2024
### What changes were proposed in this pull request?
Currently when user sets
`volumes.persistentVolumeClaim.[VolumeName].options.claimName=OnDemand`
PVCs are created with only 1 label i.e. spark-app-selector = spark.app.id.
Objective of this PR is to allow support of custom labels for onDemand PVCs

### Why are the changes needed?
Changes are needed so users can set custom labels to PVCs

### Does this PR introduce _any_ user-facing change?
It does not break any existing behaviour but adds a new feature/improvement to enable custom label additions in ondemand PVCs

### How was this patch tested?
This was tested in internal/production k8 cluster

### Was this patch authored or co-authored using generative AI tooling?
No

Closes apache#48079 from prathit06/ondemand-pvc-labels.

Lead-authored-by: prathit06 <malik.prathit@gmail.com>
Co-authored-by: Prathit malik <53890994+prathit06@users.noreply.github.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
himadripal pushed a commit to himadripal/spark that referenced this pull request Oct 19, 2024
### What changes were proposed in this pull request?
Currently when user sets
`volumes.persistentVolumeClaim.[VolumeName].options.claimName=OnDemand`
PVCs are created with only 1 label i.e. spark-app-selector = spark.app.id.
Objective of this PR is to allow support of custom labels for onDemand PVCs

### Why are the changes needed?
Changes are needed so users can set custom labels to PVCs

### Does this PR introduce _any_ user-facing change?
It does not break any existing behaviour but adds a new feature/improvement to enable custom label additions in ondemand PVCs

### How was this patch tested?
This was tested in internal/production k8 cluster

### Was this patch authored or co-authored using generative AI tooling?
No

Closes apache#48079 from prathit06/ondemand-pvc-labels.

Lead-authored-by: prathit06 <malik.prathit@gmail.com>
Co-authored-by: Prathit malik <53890994+prathit06@users.noreply.github.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants