Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data passing to Katib experiments as components of kubeflow pipelines. #1846

Open
Efthymios-Stathakis opened this issue Apr 11, 2022 · 5 comments

Comments

@Efthymios-Stathakis
Copy link

Efthymios-Stathakis commented Apr 11, 2022

/kind feature

Describe the solution you'd like
Enable data passing into a Katib component, from previous and/or to later kubeflow pipeline components without the use of persistent volumes, to be able use it as part of a portable Kubeflow pipeline (KFP). This way, the Katib-based component will maintain an isolated functionality namely to do the hyper-parameter tuning, leaving data loading and processing to previous components. This would allow to automate the entire training flow and capitalise on other features of KFP, such as caching. An example of how a desired pipeline would look like can be found in the attached pdf.

Currently, the only way to implement this is by using some persistent volume, but this dampens the portability of the pipeline. It is beneficial to be able to directly pass data to a Katib KFP components, even more if the data passing abides by the KFP v2 data passing mechanism

Anything else you would like to add:

Love this feature? Give it a 👍 We prioritize the features with the most 👍
KFP.pdf

@oadams
Copy link

oadams commented Aug 22, 2022

What is the current status for integrating Katib into a Kubeflow pipeline?

I've noted issues 1846 and 1914 are along the same lines, but just wanted to know what the current best options are.

I have a train pipeline that involves components for (a) fetching data (b) preprocessing the data and (c) training. I would like to do hyperparameter tuning over the train component. From a first look at the Katib documentation it appears not to have a native integration with Pipelines: you specify a container/command that does the training and fire up your katib experiment, but it doesn't appear that it can itself be a component without wrapping it somehow.

It seems my main options are:

  1. Have the container specificed in my Katib YAML actually orchestrate the whole pipeline
  2. Create a Pipeline component that runs Katib that runs the train script.
  3. Have my Pipeline only do the data preparation and then separately run Katib.

The first two seem overly complicated. The third approach seems the most natural, but to some extent undermines the point of using a pipeline in the first place, since the pipeline only strings together data downloading and preprocessing. It would be good to have a pipeline where the input is some data source and the final output is the best model from a hyperparameter tuning experiment.

Originally posted by @oadams in #331 (comment)

@johnugeorge
Copy link
Member

See this https://github.com/kubeflow/manifests/tree/master/tests/e2e

Katib experiments are run using Katib launcher and then results are used in TFjob launch

@Efthymios-Stathakis
Copy link
Author

Hi,

This solution uses a VolumeOp which makes the pipeline less portable. Ideally, the Katib component could get the data from the previous step in the same manner that Kubeflow pipelines implements data passing from one component to another.

@johnugeorge johnugeorge mentioned this issue Nov 2, 2022
13 tasks
@github-actions
Copy link

github-actions bot commented Sep 7, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@tenzen-y
Copy link
Member

tenzen-y commented Sep 7, 2023

/lifecycle frozen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants