Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: dedupe / memoize steps #1054

Closed
bryanlarsen opened this issue Oct 22, 2018 · 9 comments
Closed

Feature Request: dedupe / memoize steps #1054

bryanlarsen opened this issue Oct 22, 2018 · 9 comments
Labels
type/feature Feature request

Comments

@bryanlarsen
Copy link

Feature Request

If the inputs, container, command, etc. for a workflow step are all identical to a step performed in a previous workflow, the step should be skipped and the output from the previous step used instead.

I asked for this feature in slack (https://argoproj.slack.com/messages/C8J6SGN12/convo/C8J6SGN12-1539958408.000100/) and the response from Ed Lee was that I could add a shim to our step to do this. While possible, this is definitely suboptimal: the inputs are often tens of megabytes and possibly even hundreds that would have to be downloaded and fingerprinted just to do no work.

If such a feature would be welcome in Argo we would be interested in developing it and opening a PR. However, before we start we'd like to know if such a PR would be welcome. Perhaps more significantly, does the architecture of Argo make this a difficult task? For instance, if something slightly related but seemingly much more trivial such as #990 is hard to do, then our request may also be. Are we better off just writing our own Argo-lite?

@edlee2121
Copy link
Contributor

This would be a fantastic feature! Would make it much simpler to implement dynamic programming workflows.

@andreimc
Copy link
Contributor

andreimc commented Nov 4, 2018

@bryanlarsen I am doing something similar and I ran into some issues: #1073 - my use case was being able to retry failed steps and carry the old successful steps over. This might help: https://github.com/argoproj/argo/blob/master/workflow/util/util.go#L326 I am using some of the util here: https://github.com/kubebuild/agent/blob/master/pkg/schedulers/build_scheduler.go#L174 methods there to do this, it works quite well in retrying failed jobs. Only some of the metadata gets lost for DAGs not sure exactly why yet.

@jessesuen jessesuen added the type/feature Feature request label Jan 22, 2019
@alexlatchford
Copy link

Hey @bryanlarsen did you come to any conclusions on the viability of this in Argo? Likely we're trying to investigate a similar issue albeit 18 months later!

We're looking at the cost of adopting Kubeflow (vs Metaflow/Flyte both of which natively support memoization) and looks like this is the likely the blocker (Kubeflow uses Argo under the hood fo ML workflow scheduling). Allowing caching of long runnings tasks (think ETL done on Spark for example) would give us significant speed ups in data engineer & scientists velocity for obvious reasons but definitely agree it's not a trivial problem to solve!

@alexec
Copy link
Contributor

alexec commented Feb 12, 2020

@mukulikak ☝️

@talebzeghmi
Copy link
Contributor

Related kubeflow/pipelines#1509

@foobarbecue
Copy link
Contributor

foobarbecue commented May 23, 2020

Is there any way to do this this "work avoidance" pattern if using an artifact repository as opposed to a volume? I can't figure out a way to check if an artifact exists.

@alexec
Copy link
Contributor

alexec commented May 26, 2020

See #3066

@alexec
Copy link
Contributor

alexec commented May 26, 2020

Is there any way to do this this "work avoidance" pattern if using an artifact repository as opposed to a volume? I can't figure out a way to check if an artifact exists.

That should be possible. I'm hoping someone will contribute an example.

@jessesuen
Copy link
Member

Duping this to #944, which we'll be starting work on. Please send any 👍 to that issue.

icecoffee531 pushed a commit to icecoffee531/argo-workflows that referenced this issue Jan 5, 2022
Signed-off-by: Derek Wang <whynowy@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/feature Feature request
Projects
None yet
Development

No branches or pull requests

8 participants