-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(controller): Make MAX_OPERATION_TIME configurable. Close #4239 #4562
Conversation
Signed-off-by: Alex Collins <alex_collins@intuit.com>
Signed-off-by: Alex Collins <alex_collins@intuit.com>
workflow/controller/operator.go
Outdated
@@ -116,8 +116,21 @@ var ( | |||
|
|||
// maxOperationTime is the maximum time a workflow operation is allowed to run | |||
// for before requeuing the workflow onto the workqueue. | |||
const maxOperationTime = 10 * time.Second | |||
const defaultRequeueTime = maxOperationTime | |||
var maxOperationTime = 10 * time.Second |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jessesuen @sarabala1979 this change allows you to configure the maxOperationTime
. I've confirm this with a user that higher values (30s tested) prevent zombie worklows. We increase the number of workflow workers from 8 to 32 back in May. I think we should set this to 20s by default. Thoughts?
@sarabala1979 @jessesuen I've increased defaulte maxOperationTime (and defaultRequeueTime) to 30s |
Signed-off-by: Alex Collins <alex_collins@intuit.com>
const maxOperationTime = 10 * time.Second | ||
const defaultRequeueTime = maxOperationTime | ||
var ( | ||
maxOperationTime = envutil.LookupEnvDurationOr("MAX_OPERATION_TIME", 30*time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with 30s because one of our ML customers is going really big workflow with 2000 + dynamic steps. 30s will help this workflow parse the big JSON parameters. It is configurable so if there is any backlog issue, we configure it back to 10s or 20s
…j#4239 (argoproj#4562) Signed-off-by: Alex Collins <alex_collins@intuit.com> Signed-off-by: Paul Brabban <paul.brabban@gmail.com>
Signed-off-by: github@finnesand.no <github@finnesand.no> feat(ui): Add Template/Cron workflow filter to workflow page. Closes argoproj#4532 (argoproj#4543) Signed-off-by: Tianchu Zhao <evantczhao@gmail.com> feat(executor): Auto create s3 bucket if not present. Signed-off-by: Alex Capras <alexcapras@gmail.com> Apply codegen Signed-off-by: Alex Capras <alexcapras@gmail.com> Add argo-e2e label to test wf Signed-off-by: Alex Capras <alexcapras@gmail.com> chore: Updated stress test YAML (argoproj#4569) Signed-off-by: Alex Collins <alex_collins@intuit.com> docs: Updated kubectl apply command in manifests README (argoproj#4577) Signed-off-by: Stefan Gloutnikov <stefan@gloutnikov.com> feat(controller): Make MAX_OPERATION_TIME configurable. Close argoproj#4239 (argoproj#4562) Signed-off-by: Alex Collins <alex_collins@intuit.com> docs: Fix a typo in example (argoproj#4590) Signed-off-by: Takayoshi Nishida <takayoshi.nishida@gmail.com> feat(controller): Retry transient offload errors. Resolves argoproj#4464 (argoproj#4482) Signed-off-by: Alex Collins <alex_collins@intuit.com> fix(server): use the correct name when downloading artifacts (argoproj#4579) Signed-off-by: Daniel Herman <dherman@factset.com> fix(server): serve artifacts directly from disk to support large artifacts (argoproj#4589) Signed-off-by: Daniel Herman <dherman@factset.com> fix(executor): Handle sidecar killing in a process-namespace-shared pod (argoproj#4575) Signed-off-by: Daisuke Taniwaki <daisuketaniwaki@gmail.com> docs: Add JSON schema for IDE validation (argoproj#4581) Signed-off-by: Paul Brabban <paul.brabban@gmail.com> refactor: Use polling model for workflow phase metric (argoproj#4557) Signed-off-by: Simon Behar <simbeh7@gmail.com> Addressing reviewers comments Signed-off-by: Alex Capras <alexcapras@gmail.com> Addressing reviewers comments docs: Minor typo fix (argoproj#4610) Signed-off-by: Paavo Pokkinen <paavo.pokkinen@vaimo.com> fix(controller): Prevent tasks with names starting with digit to use either 'depends' or 'dependencies' (argoproj#4598) Signed-off-by: terrytangyuan <terrytangyuan@gmail.com> fix(docs): Bring minio chart instructions up to date (argoproj#4586) Signed-off-by: Ranga Krishnan <ranga@bei.re> fix(executor): Fixed waitMainContainerStart returning prematurely. Closes argoproj#4599 (argoproj#4601) Signed-off-by: fsiegmund <siegmund@slb.com> feat(controller): Enhanced artifact repository ref. See argoproj#3184 (argoproj#4458) Signed-off-by: Alex Collins <alex_collins@intuit.com> fix: Null check pagination variable (argoproj#4617) Signed-off-by: Simon Behar <simbeh7@gmail.com> fix: Perform fields filtering server side (argoproj#4595) Signed-off-by: Simon Behar <simbeh7@gmail.com> fix(server): Correct webhook event payload marshalling. Fixes argoproj#4572 (argoproj#4594) Signed-off-by: Alex Collins <alex_collins@intuit.com> feat(ui): Add columns--narrower-height to AttributeRow (argoproj#4371) fix: Fix TestCleanFieldsExclude (argoproj#4625) Signed-off-by: Simon Behar <simbeh7@gmail.com> fix(argo-server): fix global variable validation error with reversed dag.tasks (argoproj#4369) Signed-off-by: chenyu.zheng <chenyu.zheng@hulu.com> fix: derive jsonschema and fix up issues, validate examples dir… (argoproj#4611) Signed-off-by: Paul Brabban <paul.brabban@gmail.com> fix(ui): Reference secrets in EnvVars. Fixes argoproj#3973 (argoproj#4419) Signed-off-by: Alejandro Tejera <aletepe@gmail.com> fix(ui): Fix Snyk issues (argoproj#4631) Signed-off-by: Alex Collins <alex_collins@intuit.com> feat(executor): More informative log when executors do not support output param from base image layer (argoproj#4620) Signed-off-by: terrytangyuan <terrytangyuan@gmail.com> Codegen patch. Signed off by alexcapras@gmail.com Codegen patch. Signed off by alexcapras@gmail.com Delete test.patch
Signed-off-by: github@finnesand.no <github@finnesand.no> feat(ui): Add Template/Cron workflow filter to workflow page. Closes argoproj#4532 (argoproj#4543) Signed-off-by: Tianchu Zhao <evantczhao@gmail.com> feat(executor): Auto create s3 bucket if not present. Signed-off-by: Alex Capras <alexcapras@gmail.com> Apply codegen Signed-off-by: Alex Capras <alexcapras@gmail.com> Add argo-e2e label to test wf Signed-off-by: Alex Capras <alexcapras@gmail.com> chore: Updated stress test YAML (argoproj#4569) Signed-off-by: Alex Collins <alex_collins@intuit.com> docs: Updated kubectl apply command in manifests README (argoproj#4577) Signed-off-by: Stefan Gloutnikov <stefan@gloutnikov.com> feat(controller): Make MAX_OPERATION_TIME configurable. Close argoproj#4239 (argoproj#4562) Signed-off-by: Alex Collins <alex_collins@intuit.com> docs: Fix a typo in example (argoproj#4590) Signed-off-by: Takayoshi Nishida <takayoshi.nishida@gmail.com> feat(controller): Retry transient offload errors. Resolves argoproj#4464 (argoproj#4482) Signed-off-by: Alex Collins <alex_collins@intuit.com> fix(server): use the correct name when downloading artifacts (argoproj#4579) Signed-off-by: Daniel Herman <dherman@factset.com> fix(server): serve artifacts directly from disk to support large artifacts (argoproj#4589) Signed-off-by: Daniel Herman <dherman@factset.com> fix(executor): Handle sidecar killing in a process-namespace-shared pod (argoproj#4575) Signed-off-by: Daisuke Taniwaki <daisuketaniwaki@gmail.com> docs: Add JSON schema for IDE validation (argoproj#4581) Signed-off-by: Paul Brabban <paul.brabban@gmail.com> refactor: Use polling model for workflow phase metric (argoproj#4557) Signed-off-by: Simon Behar <simbeh7@gmail.com> Addressing reviewers comments Signed-off-by: Alex Capras <alexcapras@gmail.com> Addressing reviewers comments docs: Minor typo fix (argoproj#4610) Signed-off-by: Paavo Pokkinen <paavo.pokkinen@vaimo.com> fix(controller): Prevent tasks with names starting with digit to use either 'depends' or 'dependencies' (argoproj#4598) Signed-off-by: terrytangyuan <terrytangyuan@gmail.com> fix(docs): Bring minio chart instructions up to date (argoproj#4586) Signed-off-by: Ranga Krishnan <ranga@bei.re> fix(executor): Fixed waitMainContainerStart returning prematurely. Closes argoproj#4599 (argoproj#4601) Signed-off-by: fsiegmund <siegmund@slb.com> feat(controller): Enhanced artifact repository ref. See argoproj#3184 (argoproj#4458) Signed-off-by: Alex Collins <alex_collins@intuit.com> fix: Null check pagination variable (argoproj#4617) Signed-off-by: Simon Behar <simbeh7@gmail.com> fix: Perform fields filtering server side (argoproj#4595) Signed-off-by: Simon Behar <simbeh7@gmail.com> fix(server): Correct webhook event payload marshalling. Fixes argoproj#4572 (argoproj#4594) Signed-off-by: Alex Collins <alex_collins@intuit.com> feat(ui): Add columns--narrower-height to AttributeRow (argoproj#4371) fix: Fix TestCleanFieldsExclude (argoproj#4625) Signed-off-by: Simon Behar <simbeh7@gmail.com> fix(argo-server): fix global variable validation error with reversed dag.tasks (argoproj#4369) Signed-off-by: chenyu.zheng <chenyu.zheng@hulu.com> fix: derive jsonschema and fix up issues, validate examples dir… (argoproj#4611) Signed-off-by: Paul Brabban <paul.brabban@gmail.com> fix(ui): Reference secrets in EnvVars. Fixes argoproj#3973 (argoproj#4419) Signed-off-by: Alejandro Tejera <aletepe@gmail.com> fix(ui): Fix Snyk issues (argoproj#4631) Signed-off-by: Alex Collins <alex_collins@intuit.com> feat(executor): More informative log when executors do not support output param from base image layer (argoproj#4620) Signed-off-by: terrytangyuan <terrytangyuan@gmail.com> Codegen patch. Signed off by alexcapras@gmail.com Codegen patch. Signed off by alexcapras@gmail.com Delete test.patch Signed-off-by: Alex Capras <alexcapras@gmail.com>
Signed-off-by: Alex Collins alex_collins@intuit.com
Checklist:
This change is to test to see if increasing this improve issues like #4560