Data passing to Katib experiments as components of kubeflow pipelines. #1846

Efthymios-Stathakis · 2022-04-11T18:09:53Z

/kind feature

Describe the solution you'd like
Enable data passing into a Katib component, from previous and/or to later kubeflow pipeline components without the use of persistent volumes, to be able use it as part of a portable Kubeflow pipeline (KFP). This way, the Katib-based component will maintain an isolated functionality namely to do the hyper-parameter tuning, leaving data loading and processing to previous components. This would allow to automate the entire training flow and capitalise on other features of KFP, such as caching. An example of how a desired pipeline would look like can be found in the attached pdf.

Currently, the only way to implement this is by using some persistent volume, but this dampens the portability of the pipeline. It is beneficial to be able to directly pass data to a Katib KFP components, even more if the data passing abides by the KFP v2 data passing mechanism

Anything else you would like to add:

Love this feature? Give it a 👍 We prioritize the features with the most 👍
KFP.pdf

oadams · 2022-08-22T01:55:37Z

What is the current status for integrating Katib into a Kubeflow pipeline?

I've noted issues 1846 and 1914 are along the same lines, but just wanted to know what the current best options are.

I have a train pipeline that involves components for (a) fetching data (b) preprocessing the data and (c) training. I would like to do hyperparameter tuning over the train component. From a first look at the Katib documentation it appears not to have a native integration with Pipelines: you specify a container/command that does the training and fire up your katib experiment, but it doesn't appear that it can itself be a component without wrapping it somehow.

It seems my main options are:

Have the container specificed in my Katib YAML actually orchestrate the whole pipeline
Create a Pipeline component that runs Katib that runs the train script.
Have my Pipeline only do the data preparation and then separately run Katib.

The first two seem overly complicated. The third approach seems the most natural, but to some extent undermines the point of using a pipeline in the first place, since the pipeline only strings together data downloading and preprocessing. It would be good to have a pipeline where the input is some data source and the final output is the best model from a hyperparameter tuning experiment.

Originally posted by @oadams in #331 (comment)

johnugeorge · 2022-08-22T06:11:26Z

See this https://github.com/kubeflow/manifests/tree/master/tests/e2e

Katib experiments are run using Katib launcher and then results are used in TFjob launch

Efthymios-Stathakis · 2022-08-22T07:07:36Z

Hi,

This solution uses a VolumeOp which makes the pipeline less portable. Ideally, the Katib component could get the data from the previous step in the same manner that Kubeflow pipelines implements data passing from one component to another.

github-actions · 2023-09-07T20:05:15Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

tenzen-y · 2023-09-07T23:19:56Z

/lifecycle frozen

google-oss-prow bot added the kind/feature label Apr 11, 2022

johnugeorge mentioned this issue Nov 2, 2022

Katib v0.15.0 Roadmap #1993

Closed

13 tasks

github-actions bot added the lifecycle/stale label Sep 7, 2023

google-oss-prow bot added lifecycle/frozen and removed lifecycle/stale labels Sep 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data passing to Katib experiments as components of kubeflow pipelines. #1846

Data passing to Katib experiments as components of kubeflow pipelines. #1846

Efthymios-Stathakis commented Apr 11, 2022 •

edited

Loading

oadams commented Aug 22, 2022

johnugeorge commented Aug 22, 2022

Efthymios-Stathakis commented Aug 22, 2022

github-actions bot commented Sep 7, 2023

tenzen-y commented Sep 7, 2023

Data passing to Katib experiments as components of kubeflow pipelines. #1846

Data passing to Katib experiments as components of kubeflow pipelines. #1846

Comments

Efthymios-Stathakis commented Apr 11, 2022 • edited Loading

Anything else you would like to add:

oadams commented Aug 22, 2022

johnugeorge commented Aug 22, 2022

Efthymios-Stathakis commented Aug 22, 2022

github-actions bot commented Sep 7, 2023

tenzen-y commented Sep 7, 2023

Efthymios-Stathakis commented Apr 11, 2022 •

edited

Loading