Kubeflow Pipeline distributed training support

kfp-dist-train contains utilities to use together with Kubeflow Pipeline to enable writing distributed training code directly using Kubeflow Pipeline SDK.

Get Started

Setup an Kubeflow environment (maybe use https://github.com/alauda/kubeflow-chart).
Upload the example kfp-dist-train.ipynb into a Notebook instance, or setup local pipeline submit.
Execute the example to submit a workflow, you can configure the number of workers in the Kubeflow web UI. The job should look like below:

Some Roadmap

support kfpdist.component(dist=True) decorator as an wrap of dsl.component
support parameter server strategy
support pytorch

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Kubeflow Pipeline distributed training support

Get Started

Some Roadmap

Files

README.md

Latest commit

History

README.md

File metadata and controls

Kubeflow Pipeline distributed training support

Get Started

Some Roadmap