Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement rollback for script run as pre defined stage #4743

Merged

Conversation

ffjlabo
Copy link
Member

@ffjlabo ffjlabo commented Dec 28, 2023

What this PR does / why we need it:

Implement rollback for script run as pre defined stage.

Which issue(s) this PR fixes:

Part of #4643

Does this PR introduce a user-facing change?: Yes

  • How are users affected by this change:
    Users can use onRollback feature for SCRIPT_RUN stage.
    They can execute any command when the deployment failed and the executed SCRIPT_RUN stage is included in the pipeline

  • Is this breaking change: no

Signed-off-by: Yoshiki Fujikane <ffjlabo@gmail.com>
Signed-off-by: Yoshiki Fujikane <ffjlabo@gmail.com>
Signed-off-by: Yoshiki Fujikane <ffjlabo@gmail.com>
Signed-off-by: Yoshiki Fujikane <ffjlabo@gmail.com>
Copy link

codecov bot commented Dec 28, 2023

Codecov Report

Attention: 60 lines in your changes are missing coverage. Please review.

Comparison is base (d814680) 30.80% compared to head (e1c150e) 31.01%.
Report is 13 commits behind head on master.

Files Patch % Lines
pkg/app/piped/executor/kubernetes/rollback.go 0.00% 38 Missing ⚠️
pkg/app/piped/planner/kubernetes/pipeline.go 8.33% 21 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4743      +/-   ##
==========================================
+ Coverage   30.80%   31.01%   +0.21%     
==========================================
  Files         221      225       +4     
  Lines       26015    26325     +310     
==========================================
+ Hits         8013     8164     +151     
- Misses      17352    17510     +158     
- Partials      650      651       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Comment on lines +372 to +402

if rollbackStages, ok := s.deployment.FindRollbackStages(); ok {
// Update to change deployment status to ROLLING_BACK.
if err := s.reportDeploymentStatusChanged(ctx, model.DeploymentStatus_DEPLOYMENT_ROLLING_BACK, statusReason); err != nil {
return err
}

// Start running rollback stage.
var (
sig, handler = executor.NewStopSignal()
doneCh = make(chan struct{})
)
go func() {
rbs := *stage
rbs.Requires = []string{lastStage.Id}
s.executeStage(sig, rbs, func(in executor.Input) (executor.Executor, bool) {
return s.executorRegistry.RollbackExecutor(s.deployment.Kind, in)
})
close(doneCh)
}()

select {
case <-ctx.Done():
handler.Terminate()
<-doneCh
return nil

case <-doneCh:
break
for _, stage := range rollbackStages {
// Start running rollback stage.
var (
sig, handler = executor.NewStopSignal()
doneCh = make(chan struct{})
)
go func() {
rbs := *stage
rbs.Requires = []string{lastStage.Id}
s.executeStage(sig, rbs, func(in executor.Input) (executor.Executor, bool) {
return s.executorRegistry.RollbackExecutor(s.deployment.Kind, in)
})
close(doneCh)
}()

select {
case <-ctx.Done():
handler.Terminate()
<-doneCh
return nil

case <-doneCh:
break
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 Changed to execute multiple kinds of rollback stages.
The changes are to do logic just like the same as before with each rollback stage.

Comment on lines +121 to +128
if s.Name == model.StageScriptRun {
// Use metadata as a way to pass parameters to the stage.
envStr, _ := json.Marshal(s.ScriptRunStageOptions.Env)
metadata := map[string]string{
"baseStageID": out[i].Id,
"onRollback": s.ScriptRunStageOptions.OnRollback,
"env": string(envStr),
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 I use metadata for the stage to recognize below on executing it.

  • The commands which are executed on the rollback script run stage. (onRollback, env)
  • The stage IDs of script run stages, which are used to check whether to rollback it. (baseStageID)
    • It is because I want to execute the command to rollback SCRIPT_RUN to the point where the deployment was canceled or failed.

Comment on lines +441 to +455
if ps.Name == model.StageScriptRunRollback.String() {
baseStageID := ps.Metadata["baseStageID"]
if baseStageID == "" {
return
}

baseStageStatus, ok := s.stageStatuses[baseStageID]
if !ok {
return
}

if baseStageStatus == model.StageStatus_STAGE_NOT_STARTED_YET || baseStageStatus == model.StageStatus_STAGE_SKIPPED {
return
}
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 Check the status of the script run stage to decide whether to rollback or not.
It is because the target stages should be already executed.

Comment on lines +47 to +48
case model.StageScriptRunRollback:
status = e.ensureScriptRunRollback(ctx)
Copy link
Member Author

@ffjlabo ffjlabo Dec 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 Added the logic to rollback executor for each application (firstly, I added on the k8s's one) because rollback stages are executed per the kind of application not per stage.

I don't have any idea to separate the rollback for application and the rollback for the stage.

@ffjlabo ffjlabo marked this pull request as ready for review December 28, 2023 03:45
@ffjlabo
Copy link
Member Author

ffjlabo commented Dec 28, 2023

The PR summary

  • Add ROLLBACK_SCRIPT_RUN stage and enable to planbe09a10
  • Enable to execute multiple rollback stages 48be28e
  • Add script run rollback logic to k8s app 0d0ecee

The result

I tested on local env.
PipeCD

apiVersion: pipecd.dev/v1beta1
kind: KubernetesApp
spec:
  name: canary-with-script-run
  labels:
    env: example
    team: product
  pipeline:
    stages:
      - name: K8S_CANARY_ROLLOUT
        with:
          replicas: 10%
      - name: WAIT
        with:
          duration: 10s
      - name: SCRIPT_RUN
        with:
          env:
            MSG: "execute script1"
            R_MSG: "rollback script1"
          run: |
            echo $MSG
            sleep 10
          onRollback: |
            echo $R_MSG
            sleep 10
      - name: K8S_PRIMARY_ROLLOUT
      - name: K8S_CANARY_CLEAN

some problems
I don't have any idea to separate the rollback for application and the rollback for the stage...

@ffjlabo
Copy link
Member Author

ffjlabo commented Dec 28, 2023

/review

Copy link
Contributor

PR Analysis

Main theme

"Implement rollback for SCRIPT_RUN stages in deployment pipelines"

PR summary

This PR introduces the capability to handle rollback for custom script run stages within Kubernetes deployment pipelines. It ensures that if a deployment fails or is canceled, any executed SCRIPT_RUN stages are rolled back in the reverse order they were applied.

Type of PR

Enhancement

PR Feedback:

General suggestions

The PR implements rollbacks for SCRIPT_RUN stages, which is an important feature to maintain consistency and ensure clean reverts of deployments when failures occur. Expanding the rollback mechanism to include custom script stages is a valuable addition that helps in managing complex deployment scenarios.

In the addition to the scheduler.go and the new rollback.go logic for Kubernetes, the changes appropriately handle stage interdependencies and pass necessary environment variables to rollback commands.

The FindRollbackStages function introduced to the Deployment model seems to correctly identify all stages that require rollbacks, including the new SCRIPT_RUN_ROLLBACK stages.

Code feedback

  • relevant file: pkg/app/piped/controller/scheduler.go
    suggestion: |
    When iterating over rollback stages, it's important to ensure that each goroutine references its own copy of the stage variable. Due to the loop variable capture problem, goroutines may refer to the same stage which leads to unexpected rollbacks. Ensure that the variable inside the goroutine is a unique copy. (important)
    relevant line: |
    + go func() {

  • relevant file: pkg/app/piped/executor/kubernetes/rollback.go
    suggestion: |
    Check the error when unmarshaling the environment metadata into the env variable and handle it appropriately instead of ignoring it. If there's an error, it could indicate potential issues with the metadata format which may lead to unexpected behavior during rollback. (medium)
    relevant line: |
    + _ = json.Unmarshal([]byte(envStr), &env)

Security concerns:

no

This PR doesn't seem to introduce any typical security concerns like SQL injection, XSS, CSRF, etc. However, the execution of external scripts does carry the inherent risk of executing malicious code if the scripts or the environment containing the hooks are compromised. Ensure that the rollback scripts are securely managed and reviewed to prevent such issues.

@@ -171,3 +179,45 @@ func (e *rollbackExecutor) ensureRollback(ctx context.Context) model.StageStatus
}
return model.StageStatus_STAGE_SUCCESS
}

func (e *rollbackExecutor) ensureScriptRunRollback(ctx context.Context) model.StageStatus {
e.LogPersister.Infof("Runnnig commands for rollback...")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
e.LogPersister.Infof("Runnnig commands for rollback...")
e.LogPersister.Info("Runnnig commands for rollback...")

Comment on lines +192 to +195
if onRollback == "" {
e.LogPersister.Info("No commands to run")
return model.StageStatus_STAGE_SUCCESS
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just want to ensure I understand this right: This means that if there is no user-defined onRollback script, this script run rollback will not do anything, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ffjlabo What I mean here is that we may make this:

  1. Add rollbackable of type bool to specify whether the current script run can be rollbacked or not, default set to false 🤔
  2. In case of rollbackable stage, if onRollback is not set, re-run script instead or return error rollback script is missing
    wdyt?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the second thought, I got your idea.

  • If users did not define onRollback script, then there is nothing to run
  • If users defined the onRollback script, then run that script
    Let's keep this idea, thanks 👍

Signed-off-by: Yoshiki Fujikane <ffjlabo@gmail.com>
khanhtc1202
khanhtc1202 previously approved these changes Feb 1, 2024
Copy link
Member

@khanhtc1202 khanhtc1202 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with this approach, let's rethink about making this script rollback kind independent later when we introduce app kind plugin pattern

@khanhtc1202 khanhtc1202 dismissed their stale review February 1, 2024 14:11

I dismissed myself

Copy link
Member

@khanhtc1202 khanhtc1202 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's go 🚀

@khanhtc1202
Copy link
Member

@t-kikuc Please check 🙏

Copy link
Member

@t-kikuc t-kikuc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go 🚀

@khanhtc1202 khanhtc1202 merged commit ea9a453 into master Feb 2, 2024
13 of 14 checks passed
@khanhtc1202 khanhtc1202 deleted the implement-rollback-for-script-run-as-pre-defined-stage branch February 2, 2024 04:40
t-kikuc pushed a commit that referenced this pull request Feb 2, 2024
* Add ROLLBACK_SCRIPT_RUN stage and enable to plan itn

Signed-off-by: Yoshiki Fujikane <ffjlabo@gmail.com>

* Enable to execute multiple rollback stages

Signed-off-by: Yoshiki Fujikane <ffjlabo@gmail.com>

* Add script run rollback logic to k8s app

Signed-off-by: Yoshiki Fujikane <ffjlabo@gmail.com>

* Fix rfc

Signed-off-by: Yoshiki Fujikane <ffjlabo@gmail.com>

* Use log.Info

Signed-off-by: Yoshiki Fujikane <ffjlabo@gmail.com>

---------

Signed-off-by: Yoshiki Fujikane <ffjlabo@gmail.com>
@github-actions github-actions bot mentioned this pull request Feb 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants