Feature Request: dedupe / memoize steps #1054

bryanlarsen · 2018-10-22T13:25:15Z

Feature Request

If the inputs, container, command, etc. for a workflow step are all identical to a step performed in a previous workflow, the step should be skipped and the output from the previous step used instead.

I asked for this feature in slack (https://argoproj.slack.com/messages/C8J6SGN12/convo/C8J6SGN12-1539958408.000100/) and the response from Ed Lee was that I could add a shim to our step to do this. While possible, this is definitely suboptimal: the inputs are often tens of megabytes and possibly even hundreds that would have to be downloaded and fingerprinted just to do no work.

If such a feature would be welcome in Argo we would be interested in developing it and opening a PR. However, before we start we'd like to know if such a PR would be welcome. Perhaps more significantly, does the architecture of Argo make this a difficult task? For instance, if something slightly related but seemingly much more trivial such as #990 is hard to do, then our request may also be. Are we better off just writing our own Argo-lite?

edlee2121 · 2018-10-23T23:16:03Z

This would be a fantastic feature! Would make it much simpler to implement dynamic programming workflows.

andreimc · 2018-11-04T17:58:09Z

@bryanlarsen I am doing something similar and I ran into some issues: #1073 - my use case was being able to retry failed steps and carry the old successful steps over. This might help: https://github.com/argoproj/argo/blob/master/workflow/util/util.go#L326 I am using some of the util here: https://github.com/kubebuild/agent/blob/master/pkg/schedulers/build_scheduler.go#L174 methods there to do this, it works quite well in retrying failed jobs. Only some of the metadata gets lost for DAGs not sure exactly why yet.

alexlatchford · 2020-02-12T20:12:43Z

Hey @bryanlarsen did you come to any conclusions on the viability of this in Argo? Likely we're trying to investigate a similar issue albeit 18 months later!

We're looking at the cost of adopting Kubeflow (vs Metaflow/Flyte both of which natively support memoization) and looks like this is the likely the blocker (Kubeflow uses Argo under the hood fo ML workflow scheduling). Allowing caching of long runnings tasks (think ETL done on Spark for example) would give us significant speed ups in data engineer & scientists velocity for obvious reasons but definitely agree it's not a trivial problem to solve!

alexec · 2020-02-12T22:12:50Z

@mukulikak ☝️

talebzeghmi · 2020-04-03T18:14:45Z

Related kubeflow/pipelines#1509

foobarbecue · 2020-05-23T23:21:17Z

Is there any way to do this this "work avoidance" pattern if using an artifact repository as opposed to a volume? I can't figure out a way to check if an artifact exists.

alexec · 2020-05-26T15:36:53Z

See #3066

alexec · 2020-05-26T15:37:17Z

Is there any way to do this this "work avoidance" pattern if using an artifact repository as opposed to a volume? I can't figure out a way to check if an artifact exists.

That should be possible. I'm hoping someone will contribute an example.

jessesuen · 2020-06-17T23:16:30Z

Duping this to #944, which we'll be starting work on. Please send any 👍 to that issue.

Signed-off-by: Derek Wang <whynowy@gmail.com>

jessesuen added the type/feature Feature request label Jan 22, 2019

alexec mentioned this issue Apr 24, 2020

Step level memoization #944

Closed

alexec mentioned this issue May 21, 2020

docs: Document work avoidance. #3066

Merged

6 tasks

jessesuen closed this as completed Jun 17, 2020

alexec added the epic: controller enhancements label Jul 24, 2020

icecoffee531 pushed a commit to icecoffee531/argo-workflows that referenced this issue Jan 5, 2022

chore: update CRD version to apiextensions.k8s.io/v1 (argoproj#1054)

f1ce0f7

Signed-off-by: Derek Wang <whynowy@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: dedupe / memoize steps #1054

Feature Request: dedupe / memoize steps #1054

bryanlarsen commented Oct 22, 2018

edlee2121 commented Oct 23, 2018

andreimc commented Nov 4, 2018 •

edited

Loading

alexlatchford commented Feb 12, 2020

alexec commented Feb 12, 2020

talebzeghmi commented Apr 3, 2020

foobarbecue commented May 23, 2020 •

edited

Loading

alexec commented May 26, 2020

alexec commented May 26, 2020

jessesuen commented Jun 17, 2020

Feature Request: dedupe / memoize steps #1054

Feature Request: dedupe / memoize steps #1054

Comments

bryanlarsen commented Oct 22, 2018

edlee2121 commented Oct 23, 2018

andreimc commented Nov 4, 2018 • edited Loading

alexlatchford commented Feb 12, 2020

alexec commented Feb 12, 2020

talebzeghmi commented Apr 3, 2020

foobarbecue commented May 23, 2020 • edited Loading

alexec commented May 26, 2020

alexec commented May 26, 2020

jessesuen commented Jun 17, 2020

andreimc commented Nov 4, 2018 •

edited

Loading

foobarbecue commented May 23, 2020 •

edited

Loading