feat(backend): Add Semaphore and Mutex fields to Workflow CR #11370

DharmitD · 2024-11-12T14:36:24Z

Resolves #6553

Description of your changes:
This PR introduces support for Pipeline-level Semaphores and Mutexes in the KFP backend.

Changes Introduced:

Added the ability to specify a semaphore for pipelines, which controls the number of concurrent instances of a pipeline that can run. The semaphore is configured via a fixed ConfigMap named semaphore-config. The semaphore key is provided through the pipeline configuration.
Added mutex support for pipelines, ensuring that only one instance of the pipeline can run at a time if the specified mutex is defined. Mutex names are defined per pipeline, and each pipeline instance respects the specified mutex.
The Workflow CR now includes a Synchronization field, where semaphore and mutex are appropriately set.
If a pipeline has a semaphore, the backend maps the semaphore to the semaphore-config ConfigMap using the key provided by the user. Mutexes are represented by their name, ensuring mutual exclusion.

This PR should be merged only after #11340 gets merged.

Testing instructions

Build the API Server image and push to an image registry
Upload main.yaml file from here
Check in KFP UI Pipeline Spec tab if the following snippet is present:

platforms:
  kubernetes:
    pipelineConfig:
      mutexName: mutex
      semaphoreKey: semaphore

After the pipeline run is initiated, use the following command to verify that the Workflow CR has the appropriate synchronization settings:

oc get workflow -o yaml $(oc get workflow --no-headers | awk '{print $1}') | yq .spec.synchronization

The expected output should include the semaphore and mutex references:

synchronization:
    mutex:
      name: mutex
    semaphore:
      configMapKeyRef:
        key: semaphore
        name: semaphore-config

Scenarios and Argo Workflows UI Verification
- Scenario 1: Only Semaphore
  - Update the pipeline configuration to include only semaphoreKey.
  - Trigger multiple runs at the same time; the number of runs should be greater than the semaphore value we've set.
  - Check the Argo Workflows UI for the following message for the latest workflows:
    
    Waiting for kubeflow/ConfigMap/semaphore-config/semaphore lock. Lock status: 0/X
  - Verify the Workflow CR:
```
synchronization:
    semaphore:
      configMapKeyRef:
        key: semaphore
        name: semaphore-config
```
- Scenario 2: Only Mutex
  - Update the pipeline configuration to include only mutexName.
  - Trigger more than one runs
  - Check the Argo Workflows UI for the following message on the latest workflows:
    
    Waiting for kubeflow/Mutex/mutex lock. Lock status: 0/X
  - Verify the Workflow CR:
```
synchronization:
    mutex:
      name: mutex
```
- Scenario 3: Both Semaphore and Mutex
  - Use the original pipeline configuration that includes both semaphoreKey and mutexName.
  - Check the Argo Workflows UI for the messages as described above. You would see a locking message based on how many runs have been triggered, what is the semaphore value, etc.

Checklist:

You have signed off your commits
The title for your pull request (PR) should follow our title convention. Learn more about the pull request title convention used in this repository.

gregsheremeta · 2024-11-13T12:19:21Z

add fixes #6553 to the PR description

gregsheremeta · 2024-11-13T12:20:38Z

The semaphore is configured via a fixed ConfigMap named semaphore-config

We should edit the kubeflow manifest to deploy a skeleton of this configmap. You can do that in here or in a follow-up PR.

gregsheremeta · 2024-11-13T12:22:14Z

The Workflow CR now includes a Synchronization field

I would probably delete this line (and maybe edit the PR title), because that reads like things you enhanced on Workflow itself. We're just setting fields on it...

gregsheremeta · 2024-11-13T12:25:09Z

platforms:
  kubernetes:
    pipelineConfig:
      mutexName: mutex
      semaphoreKey: semaphore

The expected output should include the semaphore and mutex references:

What does Argo Workflows do when both are set?

A better verification would be to do two separate test pipelines -- one where you use mutex, and one where you use semaphore. And then in addition to verifying the Workflow yaml, also verify that multiple runs are being locked like they should be.

backend/src/apiserver/template/v2_template.go

backend/src/v2/compiler/argocompiler/argo.go

DharmitD · 2024-11-18T06:51:19Z

/hold until #11384 and #11340 get merged

rimolive · 2024-11-27T14:44:46Z

/lgtm

google-oss-prow · 2024-12-12T19:21:18Z

New changes are detected. LGTM label has been removed.

google-oss-prow · 2024-12-12T19:36:09Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign zijianjoy for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

backend/OWNERS
manifests/kustomize/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

DharmitD · 2024-12-12T19:41:02Z

Update: Added Semaphore ConfigMap name environment variable to APIServer deployment manifest, in order to provide users flexibility to set a different CM name.
If a CM name isn't set via this env var, "semaphore-config" would be the default name.
cc: @gregsheremeta @rimolive

DharmitD · 2024-12-13T20:40:43Z

Some screenshots from Argo Workflows UI, showing what the messages look like. Refer to the Scenarios and Argo Workflows UI Verification section in the PRs testing instructions to learn more about these messages.

gregsheremeta

looks good -- just a couple small things left :)

gregsheremeta · 2024-12-13T20:37:57Z

backend/src/v2/compiler/argocompiler/argo.go

@@ -28,6 +29,7 @@ import (
 	"google.golang.org/protobuf/proto"
 	"google.golang.org/protobuf/types/known/structpb"
 	k8score "k8s.io/api/core/v1"
+	v1 "k8s.io/api/core/v1"


the same import is just above this one :)

Done, removed this import and accordingly also updated the remaining code to use the k8score import instead.

gregsheremeta · 2024-12-13T20:38:47Z

backend/src/v2/compiler/argocompiler/argo.go

@@ -40,6 +42,16 @@ type Options struct {
 	// optional
 	PipelineRoot string
 	// TODO(Bobgy): add an option -- dev mode, ImagePullPolicy should only be Always in dev mode.
+	SemaphoreKey string


I don't love using this Options struct for these fields, but I'm not sure what an alternative would be. I guess it's fine for now and maybe we can figure out a cleaner way down the line.

gregsheremeta · 2024-12-13T20:40:43Z

backend/src/v2/compiler/argocompiler/argo.go

@@ -119,6 +130,28 @@ func Compile(jobArg *pipelinespec.PipelineJob, kubernetesSpecArg *pipelinespec.S
 			Entrypoint:         tmplEntrypoint,
 		},
 	}
+
+	if semaphoreKey != "" {


no need for this intermediate variable and the block above where they're set. Just use opts.SemaphoreKey

Suggested change

if semaphoreKey != "" {

if opts != nil && opts.SemaphoreKey != "" {

Done, updated to remove the variable initialization block and used opts.SemaphoreKey and opts.MutexName directly instead.

gregsheremeta · 2024-12-13T20:43:38Z

manifests/kustomize/base/pipeline/ml-pipeline-semaphore-configmap.yaml

@@ -0,0 +1,5 @@
+kind: ConfigMap


I think this lgtm, but I just had a thought. What happens if:

I install kfp and I get this configmap created

I customize it by adding keys to it

I upgrade to the next version of kfp

Is that down the line upgrade going to overwrite my customized configmap with this blank one? It'd be cool if we could test that somehow -- perhaps by manually creating the configmap on a cluster, putting data in it, and then installing kfp.

Makes sense.
To resolve this, I've added a job to the configmap, semaphore-configmap-init that runs during the Kustomize deployment process.
It checks for the existence of the semaphore-config ConfigMap using kubectl get.
If the ConfigMap does not exist, it creates one with an empty init key.
If the ConfigMap already exists, the Job skips creation and exits successfully.

- Added `Semaphore` and `Mutex` fields to the Workflow Spec to support concurrency control mechanisms directly within workflows. - Introduced a new environment variable, `SEMAPHORE_CONFIGMAP_NAME`, to the API Server deployment for managing semaphore configurations. - Added an empty ConfigMap manifest for semaphores to facilitate initial setup and testing. Signed-off-by: ddalvi <ddalvi@redhat.com>

google-oss-prow bot requested review from HumairAK and rimolive November 12, 2024 14:36

google-oss-prow bot added the size/M label Nov 12, 2024

DharmitD changed the title ~~feat(backend): Add Semaphore and Mutex fields to Workflow Spec~~ WIP:feat(backend): Add Semaphore and Mutex fields to Workflow Spec Nov 12, 2024

google-oss-prow bot added the do-not-merge/work-in-progress label Nov 12, 2024

DharmitD force-pushed the sem-mut-backend branch from 5c420db to 449cdda Compare November 13, 2024 05:06

gregsheremeta suggested changes Nov 13, 2024

View reviewed changes

DharmitD changed the title ~~WIP:feat(backend): Add Semaphore and Mutex fields to Workflow Spec~~ WIP:feat(backend): Add Semaphore and Mutex fields to Workflow CR Nov 13, 2024

DharmitD force-pushed the sem-mut-backend branch from 449cdda to 497bc6b Compare November 18, 2024 06:13

DharmitD changed the title ~~WIP:feat(backend): Add Semaphore and Mutex fields to Workflow CR~~ feat(backend): Add Semaphore and Mutex fields to Workflow CR Nov 18, 2024

google-oss-prow bot removed the do-not-merge/work-in-progress label Nov 18, 2024

google-oss-prow bot added the do-not-merge/hold label Nov 18, 2024

google-oss-prow bot assigned rimolive Nov 27, 2024

google-oss-prow bot added the lgtm label Nov 27, 2024

DharmitD force-pushed the sem-mut-backend branch from 497bc6b to ccc5fd0 Compare December 12, 2024 19:21

google-oss-prow bot removed the lgtm label Dec 12, 2024

google-oss-prow bot added size/L and removed size/M labels Dec 13, 2024

DharmitD force-pushed the sem-mut-backend branch 2 times, most recently from 85d4fe3 to b060e03 Compare December 13, 2024 20:07

gregsheremeta suggested changes Dec 13, 2024

View reviewed changes

gregsheremeta mentioned this pull request Dec 13, 2024

feat(api): Add SemaphoreKey and MutexName fields to proto #11384

Open

2 tasks

DharmitD force-pushed the sem-mut-backend branch from b060e03 to b5e29b3 Compare December 16, 2024 17:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(backend): Add Semaphore and Mutex fields to Workflow CR #11370

feat(backend): Add Semaphore and Mutex fields to Workflow CR #11370

DharmitD commented Nov 12, 2024 •

edited

Loading

gregsheremeta commented Nov 13, 2024

gregsheremeta commented Nov 13, 2024

gregsheremeta commented Nov 13, 2024

gregsheremeta commented Nov 13, 2024

DharmitD commented Nov 18, 2024

rimolive commented Nov 27, 2024

google-oss-prow bot commented Dec 12, 2024

google-oss-prow bot commented Dec 12, 2024

DharmitD commented Dec 12, 2024 •

edited

Loading

DharmitD commented Dec 13, 2024

gregsheremeta left a comment

gregsheremeta Dec 13, 2024

DharmitD Dec 16, 2024

gregsheremeta Dec 13, 2024

gregsheremeta Dec 13, 2024

DharmitD Dec 16, 2024

gregsheremeta Dec 13, 2024

DharmitD Dec 16, 2024

	if semaphoreKey != "" {
	if opts != nil && opts.SemaphoreKey != "" {

feat(backend): Add Semaphore and Mutex fields to Workflow CR #11370

Are you sure you want to change the base?

feat(backend): Add Semaphore and Mutex fields to Workflow CR #11370

Conversation

DharmitD commented Nov 12, 2024 • edited Loading

gregsheremeta commented Nov 13, 2024

gregsheremeta commented Nov 13, 2024

gregsheremeta commented Nov 13, 2024

gregsheremeta commented Nov 13, 2024

DharmitD commented Nov 18, 2024

rimolive commented Nov 27, 2024

google-oss-prow bot commented Dec 12, 2024

google-oss-prow bot commented Dec 12, 2024

DharmitD commented Dec 12, 2024 • edited Loading

DharmitD commented Dec 13, 2024

gregsheremeta left a comment

Choose a reason for hiding this comment

gregsheremeta Dec 13, 2024

Choose a reason for hiding this comment

DharmitD Dec 16, 2024

Choose a reason for hiding this comment

gregsheremeta Dec 13, 2024

Choose a reason for hiding this comment

gregsheremeta Dec 13, 2024

Choose a reason for hiding this comment

DharmitD Dec 16, 2024

Choose a reason for hiding this comment

gregsheremeta Dec 13, 2024

Choose a reason for hiding this comment

DharmitD Dec 16, 2024

Choose a reason for hiding this comment

DharmitD commented Nov 12, 2024 •

edited

Loading

DharmitD commented Dec 12, 2024 •

edited

Loading