Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Permanent storage for Argo's data #745

Closed
vicaire opened this issue Feb 15, 2018 · 7 comments
Closed

Permanent storage for Argo's data #745

vicaire opened this issue Feb 15, 2018 · 7 comments
Assignees
Labels
type/feature Feature request
Milestone

Comments

@vicaire
Copy link

vicaire commented Feb 15, 2018

Is this a BUG REPORT or FEATURE REQUEST?: FEATURE REQUEST

It is great to be able to see the workflow history using "argo list" and to dig into more details using "argo get" and "argo logs".

If I understand correctly however, all this data is stored in the Kubernetes key-value store, which is not intended to be permanent storage.

What is the solution that you would recommend to export the workflow execution history in permanent storage? Rather than just exporting all the data at once, I am more looking into a solution that would export the data as it is being generated so that it can be analysed in near real time.

Would it make sense for Argo to support this as a feature? For instance, I would provide a MySQL database, and Argo would populate it as workflows are being executed and making progress.

Thanks!

@vicaire
Copy link
Author

vicaire commented Feb 16, 2018

It looks like a Kubernetes API extension could handle reconciling the state of Argo workflows with some external storage (such as a MySQL DB, GCS, etc.):

https://github.com/kubernetes-incubator/apiserver-builder/blob/master/docs/concepts/api_building_overview.md#reconciliation

@jessesuen
Copy link
Member

Hi @vicaire, that is correct. Long term storage of workflows in etc is not a very scalable approach. In fact, ideally I would like to add some GC settings/options in the controller to simply delete workflows after some time.

We have thought about long term persistence of workflows, and have come to the conclusion that it needs to be something done outside the purview of the workflow-controller. Internally, we have described something like an archiver service, which watches for completed workflows, and simply dumps the workflow payload into a database/S3/etc... It additionally could perform GC on workflow it has already archived. argo-ui would have to be taught about secondary location in order to present a unified view of workflows..

I do think there may be some controller work to do proper archiving of logs. See #454 for some thoughts around this.

@vicaire
Copy link
Author

vicaire commented Feb 17, 2018

Thanks Jesse. That makes sense.

@joshes
Copy link

joshes commented Aug 9, 2018

@jessesuen re:

In fact, ideally I would like to add some GC settings/options in the controller to simply delete workflows after some time.

is there a ticket for this effort? i'd like to look into this as well if it's not already underway. i ended up just writing a cronjob that deletes wf's > N days, which works, but would be nice if it was something inherit to the system.

@vicaire
Copy link
Author

vicaire commented Aug 11, 2018

jessesuen@,

That looks like a nice way to do it. The libraries used to implement controllers (watch APIs, queues) should make it easy to monitor the workflows, save them and garbage collect them.

What are your thoughts about K8 API extensions for this purpose (https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/#api-server-aggregation)? It looks like it could replace the controller/CRD and take care of (optionally) persisting all the data in a DB in addition to etcd. The UI/CLI could then call this API extension and get a complete view (list what is in the DB and etcd) instead of just a view of what has not yet been garbage collected from etcd.

Alternatively, what about having the current CRD controller itself optionally save to permanent storage and garbage collect? (One issue seems to be that the list call would still just only return what is stored in etcd).

@edlee2121 edlee2121 added this to the V2.3 milestone Aug 29, 2018
@alexmt alexmt modified the milestones: v2.3, v2.4 Jan 25, 2019
@sarabala1979 sarabala1979 self-assigned this Apr 11, 2019
@sarabala1979 sarabala1979 added the type/feature Feature request label Apr 17, 2019
@jessesuen
Copy link
Member

Workflow persistence is actually back on the table, and is targeted for the next release v2.4.

However, API server work (to leverage persistence) is scheduled for later v2.5.

@agnewp
Copy link

agnewp commented Aug 23, 2019

Hey guys, i was just playing with the idea of workflow object TTL with the obvious trade-off that all workflow data is essentially lost after the object gets cleaned up by the TTL controller mechanism. My question here is if the workflow data does have a place to live 'permanently', in a database for example, are you planning to have the prometheus metrics reflect the stats gathered in this more permanent database? or should those continue to reflect the objects currently present in the key/value store in etcd? i would like it if i could have prometheus getting its metrics from the database, but i can also see the utility of also having some metrics that show the current state in etcd...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/feature Feature request
Projects
None yet
Development

No branches or pull requests

7 participants