-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request: dedupe / memoize steps #1054
Comments
This would be a fantastic feature! Would make it much simpler to implement dynamic programming workflows. |
@bryanlarsen I am doing something similar and I ran into some issues: #1073 - my use case was being able to retry failed steps and carry the old successful steps over. This might help: https://github.com/argoproj/argo/blob/master/workflow/util/util.go#L326 I am using some of the util here: https://github.com/kubebuild/agent/blob/master/pkg/schedulers/build_scheduler.go#L174 methods there to do this, it works quite well in retrying failed jobs. Only some of the metadata gets lost for DAGs not sure exactly why yet. |
Hey @bryanlarsen did you come to any conclusions on the viability of this in Argo? Likely we're trying to investigate a similar issue albeit 18 months later! We're looking at the cost of adopting Kubeflow (vs Metaflow/Flyte both of which natively support memoization) and looks like this is the likely the blocker (Kubeflow uses Argo under the hood fo ML workflow scheduling). Allowing caching of long runnings tasks (think ETL done on Spark for example) would give us significant speed ups in data engineer & scientists velocity for obvious reasons but definitely agree it's not a trivial problem to solve! |
@mukulikak ☝️ |
Related kubeflow/pipelines#1509 |
Is there any way to do this this "work avoidance" pattern if using an artifact repository as opposed to a volume? I can't figure out a way to check if an artifact exists. |
See #3066 |
That should be possible. I'm hoping someone will contribute an example. |
Duping this to #944, which we'll be starting work on. Please send any 👍 to that issue. |
Signed-off-by: Derek Wang <whynowy@gmail.com>
Feature Request
If the inputs, container, command, etc. for a workflow step are all identical to a step performed in a previous workflow, the step should be skipped and the output from the previous step used instead.
I asked for this feature in slack (https://argoproj.slack.com/messages/C8J6SGN12/convo/C8J6SGN12-1539958408.000100/) and the response from Ed Lee was that I could add a shim to our step to do this. While possible, this is definitely suboptimal: the inputs are often tens of megabytes and possibly even hundreds that would have to be downloaded and fingerprinted just to do no work.
If such a feature would be welcome in Argo we would be interested in developing it and opening a PR. However, before we start we'd like to know if such a PR would be welcome. Perhaps more significantly, does the architecture of Argo make this a difficult task? For instance, if something slightly related but seemingly much more trivial such as #990 is hard to do, then our request may also be. Are we better off just writing our own Argo-lite?
The text was updated successfully, but these errors were encountered: