Implement rollback for script run as pre defined stage #4743

ffjlabo · 2023-12-28T03:08:57Z

What this PR does / why we need it:

Implement rollback for script run as pre defined stage.

Which issue(s) this PR fixes:

Part of #4643

Does this PR introduce a user-facing change?: Yes

How are users affected by this change:
Users can use onRollback feature for SCRIPT_RUN stage.
They can execute any command when the deployment failed and the executed SCRIPT_RUN stage is included in the pipeline
Is this breaking change: no

Signed-off-by: Yoshiki Fujikane <ffjlabo@gmail.com>

codecov · 2023-12-28T03:12:28Z

Codecov Report

Attention: 60 lines in your changes are missing coverage. Please review.

Comparison is base (d814680) 30.80% compared to head (e1c150e) 31.01%.
Report is 13 commits behind head on master.

Files	Patch %	Lines
pkg/app/piped/executor/kubernetes/rollback.go	0.00%	38 Missing ⚠️
pkg/app/piped/planner/kubernetes/pipeline.go	8.33%	21 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #4743      +/-   ##
==========================================
+ Coverage   30.80%   31.01%   +0.21%     
==========================================
  Files         221      225       +4     
  Lines       26015    26325     +310     
==========================================
+ Hits         8013     8164     +151     
- Misses      17352    17510     +158     
- Partials      650      651       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

ffjlabo · 2023-12-28T03:19:16Z

pkg/app/piped/controller/scheduler.go

+
+		if rollbackStages, ok := s.deployment.FindRollbackStages(); ok {
 			// Update to change deployment status to ROLLING_BACK.
 			if err := s.reportDeploymentStatusChanged(ctx, model.DeploymentStatus_DEPLOYMENT_ROLLING_BACK, statusReason); err != nil {
 				return err
 			}

-			// Start running rollback stage.
-			var (
-				sig, handler = executor.NewStopSignal()
-				doneCh       = make(chan struct{})
-			)
-			go func() {
-				rbs := *stage
-				rbs.Requires = []string{lastStage.Id}
-				s.executeStage(sig, rbs, func(in executor.Input) (executor.Executor, bool) {
-					return s.executorRegistry.RollbackExecutor(s.deployment.Kind, in)
-				})
-				close(doneCh)
-			}()
-
-			select {
-			case <-ctx.Done():
-				handler.Terminate()
-				<-doneCh
-				return nil
-
-			case <-doneCh:
-				break
+			for _, stage := range rollbackStages {
+				// Start running rollback stage.
+				var (
+					sig, handler = executor.NewStopSignal()
+					doneCh       = make(chan struct{})
+				)
+				go func() {
+					rbs := *stage
+					rbs.Requires = []string{lastStage.Id}
+					s.executeStage(sig, rbs, func(in executor.Input) (executor.Executor, bool) {
+						return s.executorRegistry.RollbackExecutor(s.deployment.Kind, in)
+					})
+					close(doneCh)
+				}()
+
+				select {
+				case <-ctx.Done():
+					handler.Terminate()
+					<-doneCh
+					return nil
+
+				case <-doneCh:
+					break
+				}


📝 Changed to execute multiple kinds of rollback stages.
The changes are to do logic just like the same as before with each rollback stage.

ffjlabo · 2023-12-28T03:30:43Z

pkg/app/piped/planner/kubernetes/pipeline.go

+			if s.Name == model.StageScriptRun {
+				// Use metadata as a way to pass parameters to the stage.
+				envStr, _ := json.Marshal(s.ScriptRunStageOptions.Env)
+				metadata := map[string]string{
+					"baseStageID": out[i].Id,
+					"onRollback":  s.ScriptRunStageOptions.OnRollback,
+					"env":         string(envStr),
+				}


📝 I use metadata for the stage to recognize below on executing it.

The commands which are executed on the rollback script run stage. (onRollback, env)

The stage IDs of script run stages, which are used to check whether to rollback it. (baseStageID)

It is because I want to execute the command to rollback SCRIPT_RUN to the point where the deployment was canceled or failed.

ffjlabo · 2023-12-28T03:34:35Z

pkg/app/piped/controller/scheduler.go

+	if ps.Name == model.StageScriptRunRollback.String() {
+		baseStageID := ps.Metadata["baseStageID"]
+		if baseStageID == "" {
+			return
+		}
+
+		baseStageStatus, ok := s.stageStatuses[baseStageID]
+		if !ok {
+			return
+		}
+
+		if baseStageStatus == model.StageStatus_STAGE_NOT_STARTED_YET || baseStageStatus == model.StageStatus_STAGE_SKIPPED {
+			return
+		}
+	}


📝 Check the status of the script run stage to decide whether to rollback or not.
It is because the target stages should be already executed.

ffjlabo · 2023-12-28T03:38:54Z

pkg/app/piped/executor/kubernetes/rollback.go

+	case model.StageScriptRunRollback:
+		status = e.ensureScriptRunRollback(ctx)


📝 Added the logic to rollback executor for each application (firstly, I added on the k8s's one) because rollback stages are executed per the kind of application not per stage.

I don't have any idea to separate the rollback for application and the rollback for the stage.

ffjlabo · 2023-12-28T03:48:55Z

The PR summary

Add ROLLBACK_SCRIPT_RUN stage and enable to planbe09a10
Enable to execute multiple rollback stages 48be28e
Add script run rollback logic to k8s app 0d0ecee

The result

I tested on local env.

apiVersion: pipecd.dev/v1beta1
kind: KubernetesApp
spec:
  name: canary-with-script-run
  labels:
    env: example
    team: product
  pipeline:
    stages:
      - name: K8S_CANARY_ROLLOUT
        with:
          replicas: 10%
      - name: WAIT
        with:
          duration: 10s
      - name: SCRIPT_RUN
        with:
          env:
            MSG: "execute script1"
            R_MSG: "rollback script1"
          run: |
            echo $MSG
            sleep 10
          onRollback: |
            echo $R_MSG
            sleep 10
      - name: K8S_PRIMARY_ROLLOUT
      - name: K8S_CANARY_CLEAN

some problems
I don't have any idea to separate the rollback for application and the rollback for the stage...

ffjlabo · 2023-12-28T04:53:51Z

/review

github-actions · 2023-12-28T04:54:35Z

PR Analysis

Main theme

"Implement rollback for SCRIPT_RUN stages in deployment pipelines"

PR summary

This PR introduces the capability to handle rollback for custom script run stages within Kubernetes deployment pipelines. It ensures that if a deployment fails or is canceled, any executed SCRIPT_RUN stages are rolled back in the reverse order they were applied.

Type of PR

Enhancement

PR Feedback:

General suggestions

The PR implements rollbacks for SCRIPT_RUN stages, which is an important feature to maintain consistency and ensure clean reverts of deployments when failures occur. Expanding the rollback mechanism to include custom script stages is a valuable addition that helps in managing complex deployment scenarios.

In the addition to the scheduler.go and the new rollback.go logic for Kubernetes, the changes appropriately handle stage interdependencies and pass necessary environment variables to rollback commands.

The FindRollbackStages function introduced to the Deployment model seems to correctly identify all stages that require rollbacks, including the new SCRIPT_RUN_ROLLBACK stages.

Code feedback

relevant file: pkg/app/piped/controller/scheduler.go
suggestion: |
When iterating over rollback stages, it's important to ensure that each goroutine references its own copy of the stage variable. Due to the loop variable capture problem, goroutines may refer to the same stage which leads to unexpected rollbacks. Ensure that the variable inside the goroutine is a unique copy. (important)
relevant line: |
+ go func() {
relevant file: pkg/app/piped/executor/kubernetes/rollback.go
suggestion: |
Check the error when unmarshaling the environment metadata into the env variable and handle it appropriately instead of ignoring it. If there's an error, it could indicate potential issues with the metadata format which may lead to unexpected behavior during rollback. (medium)
relevant line: |
+ _ = json.Unmarshal([]byte(envStr), &env)

Security concerns:

no

This PR doesn't seem to introduce any typical security concerns like SQL injection, XSS, CSRF, etc. However, the execution of external scripts does carry the inherent risk of executing malicious code if the scripts or the environment containing the hooks are compromised. Ensure that the rollback scripts are securely managed and reviewed to prevent such issues.

khanhtc1202 · 2024-01-29T14:07:59Z

pkg/app/piped/executor/kubernetes/rollback.go

@@ -171,3 +179,45 @@ func (e *rollbackExecutor) ensureRollback(ctx context.Context) model.StageStatus
 	}
 	return model.StageStatus_STAGE_SUCCESS
 }
+
+func (e *rollbackExecutor) ensureScriptRunRollback(ctx context.Context) model.StageStatus {
+	e.LogPersister.Infof("Runnnig commands for rollback...")


Suggested change

e.LogPersister.Infof("Runnnig commands for rollback...")

e.LogPersister.Info("Runnnig commands for rollback...")

khanhtc1202 · 2024-01-29T14:12:28Z

pkg/app/piped/executor/kubernetes/rollback.go

+	if onRollback == "" {
+		e.LogPersister.Info("No commands to run")
+		return model.StageStatus_STAGE_SUCCESS
+	}


I just want to ensure I understand this right: This means that if there is no user-defined onRollback script, this script run rollback will not do anything, right?

@ffjlabo What I mean here is that we may make this:

Add rollbackable of type bool to specify whether the current script run can be rollbacked or not, default set to false 🤔

In case of rollbackable stage, if onRollback is not set, re-run script instead or return error rollback script is missing
wdyt?

On the second thought, I got your idea.

If users did not define onRollback script, then there is nothing to run

If users defined the onRollback script, then run that script
Let's keep this idea, thanks 👍

Signed-off-by: Yoshiki Fujikane <ffjlabo@gmail.com>

khanhtc1202

I'm okay with this approach, let's rethink about making this script rollback kind independent later when we introduce app kind plugin pattern

I dismissed myself

khanhtc1202

Let's go 🚀

khanhtc1202 · 2024-02-02T04:19:59Z

@t-kikuc Please check 🙏

t-kikuc

Go 🚀

* Add ROLLBACK_SCRIPT_RUN stage and enable to plan itn Signed-off-by: Yoshiki Fujikane <ffjlabo@gmail.com> * Enable to execute multiple rollback stages Signed-off-by: Yoshiki Fujikane <ffjlabo@gmail.com> * Add script run rollback logic to k8s app Signed-off-by: Yoshiki Fujikane <ffjlabo@gmail.com> * Fix rfc Signed-off-by: Yoshiki Fujikane <ffjlabo@gmail.com> * Use log.Info Signed-off-by: Yoshiki Fujikane <ffjlabo@gmail.com> --------- Signed-off-by: Yoshiki Fujikane <ffjlabo@gmail.com>

ffjlabo added 4 commits December 28, 2023 11:41

Add ROLLBACK_SCRIPT_RUN stage and enable to plan itn

be09a10

Signed-off-by: Yoshiki Fujikane <ffjlabo@gmail.com>

Enable to execute multiple rollback stages

48be28e

Signed-off-by: Yoshiki Fujikane <ffjlabo@gmail.com>

Add script run rollback logic to k8s app

0d0ecee

Signed-off-by: Yoshiki Fujikane <ffjlabo@gmail.com>

Fix rfc

53401db

Signed-off-by: Yoshiki Fujikane <ffjlabo@gmail.com>

github-actions bot added the area/go label Dec 28, 2023

ffjlabo commented Dec 28, 2023

View reviewed changes

ffjlabo marked this pull request as ready for review December 28, 2023 03:45

ffjlabo requested review from nghialv, khanhtc1202, kentakozuka and t-kikuc as code owners December 28, 2023 03:45

ffjlabo added the do-not-merge/wip label Dec 28, 2023

ffjlabo removed the do-not-merge/wip label Jan 12, 2024

khanhtc1202 reviewed Jan 29, 2024

View reviewed changes

Use log.Info

e1c150e

Signed-off-by: Yoshiki Fujikane <ffjlabo@gmail.com>

ffjlabo requested a review from khanhtc1202 February 1, 2024 02:51

khanhtc1202 previously approved these changes Feb 1, 2024

View reviewed changes

khanhtc1202 approved these changes Feb 2, 2024

View reviewed changes

t-kikuc approved these changes Feb 2, 2024

View reviewed changes

khanhtc1202 merged commit ea9a453 into master Feb 2, 2024
13 of 14 checks passed

khanhtc1202 deleted the implement-rollback-for-script-run-as-pre-defined-stage branch February 2, 2024 04:40

github-actions bot mentioned this pull request Feb 6, 2024

Release v0.46.0 #4775

Merged

This was referenced Jul 2, 2024

Introduce script run stage #4643

Open

Fix the docs for the script run feature status #5018

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement rollback for script run as pre defined stage #4743

Implement rollback for script run as pre defined stage #4743

ffjlabo commented Dec 28, 2023

codecov bot commented Dec 28, 2023 •

edited

Loading

ffjlabo Dec 28, 2023

ffjlabo Dec 28, 2023

ffjlabo Dec 28, 2023

ffjlabo Dec 28, 2023 •

edited

Loading

ffjlabo commented Dec 28, 2023 •

edited

Loading

ffjlabo commented Dec 28, 2023

github-actions bot commented Dec 28, 2023

khanhtc1202 Jan 29, 2024

khanhtc1202 Jan 29, 2024

khanhtc1202 Feb 1, 2024

khanhtc1202 Feb 2, 2024

khanhtc1202 left a comment

khanhtc1202 left a comment

khanhtc1202 commented Feb 2, 2024

t-kikuc left a comment

		case model.StageScriptRunRollback:
		status = e.ensureScriptRunRollback(ctx)

	e.LogPersister.Infof("Runnnig commands for rollback...")
	e.LogPersister.Info("Runnnig commands for rollback...")

Implement rollback for script run as pre defined stage #4743

Implement rollback for script run as pre defined stage #4743

Conversation

ffjlabo commented Dec 28, 2023

codecov bot commented Dec 28, 2023 • edited Loading

Codecov Report

ffjlabo Dec 28, 2023

Choose a reason for hiding this comment

ffjlabo Dec 28, 2023

Choose a reason for hiding this comment

ffjlabo Dec 28, 2023

Choose a reason for hiding this comment

ffjlabo Dec 28, 2023 • edited Loading

Choose a reason for hiding this comment

ffjlabo commented Dec 28, 2023 • edited Loading

ffjlabo commented Dec 28, 2023

github-actions bot commented Dec 28, 2023

PR Analysis

Main theme

PR summary

Type of PR

PR Feedback:

General suggestions

Code feedback

Security concerns:

khanhtc1202 Jan 29, 2024

Choose a reason for hiding this comment

khanhtc1202 Jan 29, 2024

Choose a reason for hiding this comment

khanhtc1202 Feb 1, 2024

Choose a reason for hiding this comment

khanhtc1202 Feb 2, 2024

Choose a reason for hiding this comment

khanhtc1202 left a comment

Choose a reason for hiding this comment

khanhtc1202 left a comment

Choose a reason for hiding this comment

khanhtc1202 commented Feb 2, 2024

t-kikuc left a comment

Choose a reason for hiding this comment

codecov bot commented Dec 28, 2023 •

edited

Loading

ffjlabo Dec 28, 2023 •

edited

Loading

ffjlabo commented Dec 28, 2023 •

edited

Loading