Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(controller): Make MAX_OPERATION_TIME configurable. Close #4239 #4562

Merged
merged 6 commits into from
Nov 21, 2020

Conversation

alexec
Copy link
Contributor

@alexec alexec commented Nov 19, 2020

Signed-off-by: Alex Collins alex_collins@intuit.com

Checklist:

This change is to test to see if increasing this improve issues like #4560

Signed-off-by: Alex Collins <alex_collins@intuit.com>
Signed-off-by: Alex Collins <alex_collins@intuit.com>
@alexec alexec linked an issue Nov 19, 2020 that may be closed by this pull request
@alexec alexec marked this pull request as ready for review November 20, 2020 19:01
@@ -116,8 +116,21 @@ var (

// maxOperationTime is the maximum time a workflow operation is allowed to run
// for before requeuing the workflow onto the workqueue.
const maxOperationTime = 10 * time.Second
const defaultRequeueTime = maxOperationTime
var maxOperationTime = 10 * time.Second
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jessesuen @sarabala1979 this change allows you to configure the maxOperationTime. I've confirm this with a user that higher values (30s tested) prevent zombie worklows. We increase the number of workflow workers from 8 to 32 back in May. I think we should set this to 20s by default. Thoughts?

@alexec alexec changed the title feat(controller): Make MAX_OPERATION_TIME configurable. feat(controller): Make MAX_OPERATION_TIME configurable. Close #4239 Nov 20, 2020
@alexec alexec added this to the v2.12 milestone Nov 20, 2020
@alexec
Copy link
Contributor Author

alexec commented Nov 20, 2020

@sarabala1979 @jessesuen I've increased defaulte maxOperationTime (and defaultRequeueTime) to 30s

Signed-off-by: Alex Collins <alex_collins@intuit.com>
const maxOperationTime = 10 * time.Second
const defaultRequeueTime = maxOperationTime
var (
maxOperationTime = envutil.LookupEnvDurationOr("MAX_OPERATION_TIME", 30*time.Second)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with 30s because one of our ML customers is going really big workflow with 2000 + dynamic steps. 30s will help this workflow parse the big JSON parameters. It is configurable so if there is any backlog issue, we configure it back to 10s or 20s

@alexec alexec merged commit 15fd579 into argoproj:master Nov 21, 2020
@alexec alexec deleted the mot branch November 21, 2020 01:01
alexec added a commit that referenced this pull request Nov 21, 2020
…4562)

Signed-off-by: Alex Collins <alex_collins@intuit.com>
brabster pushed a commit to brabster/argo that referenced this pull request Nov 24, 2020
…j#4239 (argoproj#4562)

Signed-off-by: Alex Collins <alex_collins@intuit.com>
Signed-off-by: Paul Brabban <paul.brabban@gmail.com>
alexcapras pushed a commit to alexcapras/argo that referenced this pull request Dec 2, 2020
Signed-off-by: github@finnesand.no <github@finnesand.no>

feat(ui): Add Template/Cron workflow filter to workflow page. Closes argoproj#4532 (argoproj#4543)

Signed-off-by: Tianchu Zhao <evantczhao@gmail.com>

feat(executor): Auto create s3 bucket if not present.

Signed-off-by: Alex Capras <alexcapras@gmail.com>

Apply codegen

Signed-off-by: Alex Capras <alexcapras@gmail.com>

Add argo-e2e label to test wf

Signed-off-by: Alex Capras <alexcapras@gmail.com>

chore: Updated stress test YAML (argoproj#4569)

Signed-off-by: Alex Collins <alex_collins@intuit.com>

docs: Updated kubectl apply command in manifests README (argoproj#4577)

Signed-off-by: Stefan Gloutnikov <stefan@gloutnikov.com>

feat(controller): Make MAX_OPERATION_TIME configurable. Close argoproj#4239 (argoproj#4562)

Signed-off-by: Alex Collins <alex_collins@intuit.com>

docs: Fix a typo in example (argoproj#4590)

Signed-off-by: Takayoshi Nishida <takayoshi.nishida@gmail.com>

feat(controller): Retry transient offload errors. Resolves argoproj#4464 (argoproj#4482)

Signed-off-by: Alex Collins <alex_collins@intuit.com>

fix(server): use the correct name when downloading artifacts (argoproj#4579)

Signed-off-by: Daniel Herman <dherman@factset.com>

fix(server): serve artifacts directly from disk to support large artifacts (argoproj#4589)

Signed-off-by: Daniel Herman <dherman@factset.com>

fix(executor): Handle sidecar killing in a process-namespace-shared pod (argoproj#4575)

Signed-off-by: Daisuke Taniwaki <daisuketaniwaki@gmail.com>

docs: Add JSON schema for IDE validation (argoproj#4581)

Signed-off-by: Paul Brabban <paul.brabban@gmail.com>

refactor: Use polling model for workflow phase metric (argoproj#4557)

Signed-off-by: Simon Behar <simbeh7@gmail.com>

Addressing reviewers comments

Signed-off-by: Alex Capras <alexcapras@gmail.com>

Addressing reviewers comments

docs: Minor typo fix (argoproj#4610)

Signed-off-by: Paavo Pokkinen <paavo.pokkinen@vaimo.com>

fix(controller): Prevent tasks with names starting with digit to use either 'depends' or 'dependencies' (argoproj#4598)

Signed-off-by: terrytangyuan <terrytangyuan@gmail.com>

fix(docs): Bring minio chart instructions up to date (argoproj#4586)

Signed-off-by: Ranga Krishnan <ranga@bei.re>

fix(executor): Fixed waitMainContainerStart returning prematurely. Closes argoproj#4599 (argoproj#4601)

Signed-off-by: fsiegmund <siegmund@slb.com>

feat(controller): Enhanced artifact repository ref. See argoproj#3184 (argoproj#4458)

Signed-off-by: Alex Collins <alex_collins@intuit.com>

fix: Null check pagination variable (argoproj#4617)

Signed-off-by: Simon Behar <simbeh7@gmail.com>

fix: Perform fields filtering server side (argoproj#4595)

Signed-off-by: Simon Behar <simbeh7@gmail.com>

fix(server): Correct webhook event payload marshalling. Fixes argoproj#4572 (argoproj#4594)

Signed-off-by: Alex Collins <alex_collins@intuit.com>

feat(ui): Add columns--narrower-height to AttributeRow (argoproj#4371)

fix: Fix TestCleanFieldsExclude (argoproj#4625)

Signed-off-by: Simon Behar <simbeh7@gmail.com>

fix(argo-server): fix global variable validation error with reversed dag.tasks (argoproj#4369)

Signed-off-by: chenyu.zheng <chenyu.zheng@hulu.com>

fix: derive jsonschema and fix up issues, validate examples dir… (argoproj#4611)

Signed-off-by: Paul Brabban <paul.brabban@gmail.com>

fix(ui): Reference secrets in EnvVars. Fixes argoproj#3973  (argoproj#4419)

Signed-off-by: Alejandro Tejera <aletepe@gmail.com>

fix(ui): Fix Snyk issues (argoproj#4631)

Signed-off-by: Alex Collins <alex_collins@intuit.com>

feat(executor): More informative log when executors do not support output param from base image layer (argoproj#4620)

Signed-off-by: terrytangyuan <terrytangyuan@gmail.com>

Codegen patch. Signed off by alexcapras@gmail.com

Codegen patch. Signed off by alexcapras@gmail.com

Delete test.patch
alexcapras pushed a commit to alexcapras/argo that referenced this pull request Dec 2, 2020
Signed-off-by: github@finnesand.no <github@finnesand.no>

feat(ui): Add Template/Cron workflow filter to workflow page. Closes argoproj#4532 (argoproj#4543)

Signed-off-by: Tianchu Zhao <evantczhao@gmail.com>

feat(executor): Auto create s3 bucket if not present.

Signed-off-by: Alex Capras <alexcapras@gmail.com>

Apply codegen

Signed-off-by: Alex Capras <alexcapras@gmail.com>

Add argo-e2e label to test wf

Signed-off-by: Alex Capras <alexcapras@gmail.com>

chore: Updated stress test YAML (argoproj#4569)

Signed-off-by: Alex Collins <alex_collins@intuit.com>

docs: Updated kubectl apply command in manifests README (argoproj#4577)

Signed-off-by: Stefan Gloutnikov <stefan@gloutnikov.com>

feat(controller): Make MAX_OPERATION_TIME configurable. Close argoproj#4239 (argoproj#4562)

Signed-off-by: Alex Collins <alex_collins@intuit.com>

docs: Fix a typo in example (argoproj#4590)

Signed-off-by: Takayoshi Nishida <takayoshi.nishida@gmail.com>

feat(controller): Retry transient offload errors. Resolves argoproj#4464 (argoproj#4482)

Signed-off-by: Alex Collins <alex_collins@intuit.com>

fix(server): use the correct name when downloading artifacts (argoproj#4579)

Signed-off-by: Daniel Herman <dherman@factset.com>

fix(server): serve artifacts directly from disk to support large artifacts (argoproj#4589)

Signed-off-by: Daniel Herman <dherman@factset.com>

fix(executor): Handle sidecar killing in a process-namespace-shared pod (argoproj#4575)

Signed-off-by: Daisuke Taniwaki <daisuketaniwaki@gmail.com>

docs: Add JSON schema for IDE validation (argoproj#4581)

Signed-off-by: Paul Brabban <paul.brabban@gmail.com>

refactor: Use polling model for workflow phase metric (argoproj#4557)

Signed-off-by: Simon Behar <simbeh7@gmail.com>

Addressing reviewers comments

Signed-off-by: Alex Capras <alexcapras@gmail.com>

Addressing reviewers comments

docs: Minor typo fix (argoproj#4610)

Signed-off-by: Paavo Pokkinen <paavo.pokkinen@vaimo.com>

fix(controller): Prevent tasks with names starting with digit to use either 'depends' or 'dependencies' (argoproj#4598)

Signed-off-by: terrytangyuan <terrytangyuan@gmail.com>

fix(docs): Bring minio chart instructions up to date (argoproj#4586)

Signed-off-by: Ranga Krishnan <ranga@bei.re>

fix(executor): Fixed waitMainContainerStart returning prematurely. Closes argoproj#4599 (argoproj#4601)

Signed-off-by: fsiegmund <siegmund@slb.com>

feat(controller): Enhanced artifact repository ref. See argoproj#3184 (argoproj#4458)

Signed-off-by: Alex Collins <alex_collins@intuit.com>

fix: Null check pagination variable (argoproj#4617)

Signed-off-by: Simon Behar <simbeh7@gmail.com>

fix: Perform fields filtering server side (argoproj#4595)

Signed-off-by: Simon Behar <simbeh7@gmail.com>

fix(server): Correct webhook event payload marshalling. Fixes argoproj#4572 (argoproj#4594)

Signed-off-by: Alex Collins <alex_collins@intuit.com>

feat(ui): Add columns--narrower-height to AttributeRow (argoproj#4371)

fix: Fix TestCleanFieldsExclude (argoproj#4625)

Signed-off-by: Simon Behar <simbeh7@gmail.com>

fix(argo-server): fix global variable validation error with reversed dag.tasks (argoproj#4369)

Signed-off-by: chenyu.zheng <chenyu.zheng@hulu.com>

fix: derive jsonschema and fix up issues, validate examples dir… (argoproj#4611)

Signed-off-by: Paul Brabban <paul.brabban@gmail.com>

fix(ui): Reference secrets in EnvVars. Fixes argoproj#3973  (argoproj#4419)

Signed-off-by: Alejandro Tejera <aletepe@gmail.com>

fix(ui): Fix Snyk issues (argoproj#4631)

Signed-off-by: Alex Collins <alex_collins@intuit.com>

feat(executor): More informative log when executors do not support output param from base image layer (argoproj#4620)

Signed-off-by: terrytangyuan <terrytangyuan@gmail.com>

Codegen patch. Signed off by alexcapras@gmail.com

Codegen patch. Signed off by alexcapras@gmail.com

Delete test.patch

Signed-off-by: Alex Capras <alexcapras@gmail.com>
alexec added a commit that referenced this pull request Dec 9, 2020
…4562)

Signed-off-by: Alex Collins <alex_collins@intuit.com>
@agilgur5 agilgur5 added the area/controller Controller issues, panics label Jul 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/controller Controller issues, panics
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Increase max reconciliation time
3 participants