You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Cached step/task output (parameter or artifacts) can be referred to in another workflow to avoid the same step execution which will save time and resources.
In ETL and ML use cases, Some steps/tasks in all workflow will be the same output if the same input is passed. If Argo has the ability to cache the output for those steps, it can be referred to in another workflow. The cached steps/task execution will be skipped and just used the cached output
Proposal
The template will have a flag for cachable.
- name: gen-number-list
cachable: true
script:
image: python:alpine3.6
command: [python]
source: |
import json
import sys
json.dump([i for i in range(20, 31)], sys.stdout)
Create new CRD which will hold the node status of the latest succeed template
I agree it's not a bad idea but for Argo you are responsible for the data flow! Which means you copy the results of the step into S3 and let all depending steps copy the data back. I use Amazon EKS and it's clearly restricted to "WriteReadOnce" volumes (EBS) which means a volume can be mounted to one node only.
What could be possible technically is to have separation and aggregation of artifacts. So separation would mean i copy data from one volume to many other volumes and aggregation means I copy from many volumes to one. This would allow a single step producing results that are processed in parallel by the next step without the need of using S3 buckets in between. With EKS there is still the restriction, that one EBS volume is restricted to it's Availiblity Zone (AZ) which is a problem for aggregation when the volumes have been created in different AZs.
Summary
Cached step/task output (parameter or artifacts) can be referred to in another workflow to avoid the same step execution which will save time and resources.
Similar issue #944
Motivation
In ETL and ML use cases, Some steps/tasks in all workflow will be the same output if the same input is passed. If Argo has the ability to cache the output for those steps, it can be referred to in another workflow. The cached steps/task execution will be skipped and just used the cached output
Proposal
The text was updated successfully, but these errors were encountered: