-
Notifications
You must be signed in to change notification settings - Fork 443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data passing to Katib experiments as components of kubeflow pipelines. #1846
Comments
What is the current status for integrating Katib into a Kubeflow pipeline? I've noted issues 1846 and 1914 are along the same lines, but just wanted to know what the current best options are. I have a train pipeline that involves components for (a) fetching data (b) preprocessing the data and (c) training. I would like to do hyperparameter tuning over the train component. From a first look at the Katib documentation it appears not to have a native integration with Pipelines: you specify a container/command that does the training and fire up your katib experiment, but it doesn't appear that it can itself be a component without wrapping it somehow. It seems my main options are:
The first two seem overly complicated. The third approach seems the most natural, but to some extent undermines the point of using a pipeline in the first place, since the pipeline only strings together data downloading and preprocessing. It would be good to have a pipeline where the input is some data source and the final output is the best model from a hyperparameter tuning experiment. Originally posted by @oadams in #331 (comment) |
See this https://github.com/kubeflow/manifests/tree/master/tests/e2e Katib experiments are run using Katib launcher and then results are used in TFjob launch |
Hi, This solution uses a VolumeOp which makes the pipeline less portable. Ideally, the Katib component could get the data from the previous step in the same manner that Kubeflow pipelines implements data passing from one component to another. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
/lifecycle frozen |
/kind feature
Describe the solution you'd like
Enable data passing into a Katib component, from previous and/or to later kubeflow pipeline components without the use of persistent volumes, to be able use it as part of a portable Kubeflow pipeline (KFP). This way, the Katib-based component will maintain an isolated functionality namely to do the hyper-parameter tuning, leaving data loading and processing to previous components. This would allow to automate the entire training flow and capitalise on other features of KFP, such as caching. An example of how a desired pipeline would look like can be found in the attached pdf.
Currently, the only way to implement this is by using some persistent volume, but this dampens the portability of the pipeline. It is beneficial to be able to directly pass data to a Katib KFP components, even more if the data passing abides by the KFP v2 data passing mechanism
Anything else you would like to add:
Love this feature? Give it a 👍 We prioritize the features with the most 👍
KFP.pdf
The text was updated successfully, but these errors were encountered: