New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Add Katib early stopping documentation #2336

Merged

k8s-ci-robot merged 7 commits into kubeflow:master from andreyvelich:add-early-stopping-doc

Nov 13, 2020

Member

andreyvelich commented Nov 5, 2020

Blocked by: #2312.
Related: kubeflow/katib#1360.

I added doc of using early stopping in Katib.

/assign @gaocegege @johnugeorge
/cc @8bitmp3 @RFMVasconcelos

k8s-ci-robot assigned gaocegege and johnugeorge

k8s-ci-robot added the do-not-merge/work-in-progress label

k8s-ci-robot requested review from 8bitmp3 and rui-vas

November 5, 2020 04:32

k8s-ci-robot added the size/XL label

kubeflow-bot commented Nov 5, 2020

This change is

Contributor

rui-vas commented Nov 5, 2020

This is very cool @andreyvelich !

Member Author

andreyvelich commented Nov 5, 2020

@RFMVasconcelos Thank you!
That should be the latest doc PR for the Katib 0.10

andreyvelich mentioned this pull request

Kubeflow 1.2 release doc changes #2322

Closed

andreyvelich added 6 commits

November 11, 2020 16:01


          Add early stopping doc

5deee86


          Few fixes

e41e232


          Fix few spelling mistakes

8a5abbb


          Fix Early Stopping type link

6d0d5bd


          Add service account name info

dac91bd


          Modify guides

66842ed

andreyvelich force-pushed the add-early-stopping-doc branch from 3757562 to 66842ed Compare

November 11, 2020 16:52

k8s-ci-robot added size/L and removed size/XL labels

andreyvelich changed the title ~~[WIP] Add Katib early stopping documentation~~ Add Katib early stopping documentation

k8s-ci-robot removed the do-not-merge/work-in-progress label

Member Author

andreyvelich commented Nov 11, 2020

This PR is ready.
/cc @8bitmp3 @gaocegege @johnugeorge.

@8bitmp3 I capitalise all titles to be consistent with other guides. For example: Notebooks or KFP.
WDYT ?

8bitmp3 reviewed

View reviewed changes

content/en/docs/components/katib/early-stopping.md Outdated

@@ @@ -0,0 +1,204 @@ @@
+              +++
+              title = "Using Early Stopping"
+              description = "How to use an early stopping in Katib experiments"

Contributor

8bitmp3 Nov 11, 2020

Suggested change

      
            description = "How to use an early stopping in Katib experiments"
          
            description = "How to use early stopping in Katib experiments"

8bitmp3 reviewed

View reviewed changes

content/en/docs/components/katib/early-stopping.md Outdated

Comment on lines 10 to 13

+              Katib experiments. Early stopping allows you to avoid overfitting when you
+              train your model during Katib experiments. It helps you to save computing
+              resources and experiment execution time by stopping the experiment's trials
+              before the training process is complete.

Contributor

8bitmp3 Nov 11, 2020

AFAIK, early stopping helps with resources and execution when the (validation) loss or some other target metric no longer improves. Let's add that here to accommodate for the users who are new to ML or aren't as proficient in ML as the others.

For example:

Suggested change

      
            Katib experiments. Early stopping allows you to avoid overfitting when you
          
            train your model during Katib experiments. It helps you to save computing
          
            resources and experiment execution time by stopping the experiment's trials
          
            before the training process is complete.
          
            Katib experiments. Early stopping allows you to avoid overfitting when you
          
            train your model during Katib experiments. It also helps by saving computing
          
            resources and reducing experiment execution time by stopping the experiment's trials
          
            when the target metric(s) no longer improves before the training process is complete.

Notice the use of "the target metric(s)"

Member Author

andreyvelich Nov 12, 2020

Agree, nice explanation!

8bitmp3 reviewed

View reviewed changes

content/en/docs/components/katib/early-stopping.md Outdated

+              resources and experiment execution time by stopping the experiment's trials
+              before the training process is complete.
+              The major advantage of using early stopping in Katib, is that you don't

Contributor

8bitmp3 Nov 11, 2020

Suggested change

      
            The major advantage of using early stopping in Katib, is that you don't
          
            The major advantage of using early stopping in Katib is that you don't

8bitmp3 reviewed

View reviewed changes

content/en/docs/components/katib/early-stopping.md Outdated

+              The major advantage of using early stopping in Katib, is that you don't
+              need to modify your
+              [training container package](/docs/components/katib/experiment/#packaging-your-training-code-in-a-container-image).
+              All you have to do is to change your experiment YAML file.

Contributor

8bitmp3 Nov 11, 2020

Suggested change

      
            All you have to do is to change your experiment YAML file.
          
            All you have to do is make necessary changes in your experiment's YAML file.

8bitmp3 reviewed

View reviewed changes

content/en/docs/components/katib/early-stopping.md Outdated

+              because early stopping algorithms need to know the sequence of reported metrics.
+              Check the
+              [`MXNet` example](https://github.com/kubeflow/katib/blob/master/examples/v1beta1/mxnet-mnist/mnist.py#L36)
+              how to add date format to your logs.

Contributor

8bitmp3 Nov 11, 2020

Suggested change

      
            how to add date format to your logs.
          
            to learn how to add a date format to your logs.

8bitmp3 reviewed

View reviewed changes

content/en/docs/components/katib/early-stopping.md Outdated

+              As a reference, you can use the YAML file of the
+              [early stopping example](https://github.com/kubeflow/katib/blob/master/examples/v1beta1/early-stopping/median-stop.yaml).
+              First of all, follow the

Contributor

8bitmp3 Nov 11, 2020

Suggested change

      
            First of all, follow the
          
            1. Follow the

8bitmp3 reviewed

View reviewed changes

content/en/docs/components/katib/early-stopping.md Outdated

+              First of all, follow the
+              [guide](/docs/components/katib/experiment/#configuring-the-experiment)
+              to configure your Katib experiment.
+              To apply early stopping for your experiment, specify the `.spec.earlyStopping`

Contributor

8bitmp3 Nov 11, 2020

Suggested change

      
            To apply early stopping for your experiment, specify the `.spec.earlyStopping`
          
            2. Next, to apply early stopping for your experiment, specify the `.spec.earlyStopping`

8bitmp3 reviewed

View reviewed changes

content/en/docs/components/katib/early-stopping.md Outdated

+              to configure your Katib experiment.
+              To apply early stopping for your experiment, specify the `.spec.earlyStopping`
+              parameter, similar to the `.spec.algorithm`. Refer to the
+              [`EarlyStoppingSpec` type](https://github.com/kubeflow/katib/blob/master/pkg/apis/controller/common/v1beta1/common_types.go#L41-L58)

Contributor

8bitmp3 Nov 11, 2020

Suggested change

      
            [`EarlyStoppingSpec` type](https://github.com/kubeflow/katib/blob/master/pkg/apis/controller/common/v1beta1/common_types.go#L41-L58)
          
            [`EarlyStoppingSpec` type](https://github.com/kubeflow/katib/blob/master/pkg/apis/controller/common/v1beta1/common_types.go#L41-L58) 
          
            for more information.

8bitmp3 reviewed

View reviewed changes

content/en/docs/components/katib/early-stopping.md Outdated


		- `.earlyStopping.algorithmSettings`- the settings for the early stopping algorithm.

		Experiment's suggestion produces new trials. After that, the early stopping

Contributor

8bitmp3 Nov 11, 2020

Suggested change

      
            Experiment's suggestion produces new trials. After that, the early stopping
          
            What happens is your experiment's suggestion produces new trials. After that, the early stopping

or "will produce... will generate..."

8bitmp3 reviewed

View reviewed changes

content/en/docs/components/katib/early-stopping.md Outdated

+              ### Early stopping algorithms in detail
+              Here’s a list of the early stopping algorithms available in Katib.
+              The links lead to descriptions on this page:

Contributor

8bitmp3 Nov 11, 2020

Suggested change

The links lead to descriptions on this page:

This may be redundant if self-evident, I think

8bitmp3 reviewed

View reviewed changes

content/en/docs/components/katib/early-stopping.md Outdated


		- [Median Stopping Rule](#median-stopping-rule)

		More algorithms are under development. You can add an early stopping algorithm

Contributor

8bitmp3 Nov 11, 2020

Suggested change

      
            More algorithms are under development. You can add an early stopping algorithm
          
            More algorithms are under development. 
          
            You can add an early stopping algorithm

8bitmp3 reviewed

View reviewed changes

content/en/docs/components/katib/early-stopping.md Outdated

+              best objective value by step `S` is worse than the median value of the running
+              averages of all completed trials' objectives reported up to step `S`.
+              To learn more about it, check [this paper](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/46180.pdf).

Contributor

8bitmp3 Nov 11, 2020

Suggested change

      
            To learn more about it, check [this paper](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/46180.pdf).
          
            To learn more about it, check [Google Vizier: A Service for Black-Box Optimization](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/46180.pdf).

8bitmp3 reviewed

View reviewed changes

content/en/docs/components/katib/early-stopping.md Outdated

Comment on lines 126 to 127

		You have to install [jq](https://stedolan.github.io/jq/download/),
		to run below commands.

Contributor

8bitmp3 Nov 11, 2020

Suggested change

      
            You have to install [jq](https://stedolan.github.io/jq/download/),
          
            to run below commands.
          
            First, make sure you have [jq](https://stedolan.github.io/jq/download/) installed.

8bitmp3 reviewed

View reviewed changes

content/en/docs/components/katib/early-stopping.md Outdated

+              }
+              ```
+              If you check status for the early stopped trial:

Contributor

8bitmp3 Nov 11, 2020

Suggested change

      
            If you check status for the early stopped trial:
          
            Check the status of the early stopped trial by running this command:

8bitmp3 reviewed

View reviewed changes

content/en/docs/components/katib/early-stopping.md Outdated

+              kubectl get trial median-stop-2ml8h96d -n <experiment-namespace>
+              ```
+              You should be able to view `EarlyStopped` status for the trial:

Contributor

8bitmp3 Nov 11, 2020

Suggested change

      
            You should be able to view `EarlyStopped` status for the trial:
          
            and you should be able to view `EarlyStopped` status for the trial:

8bitmp3 reviewed

View reviewed changes

content/en/docs/components/katib/early-stopping.md Outdated

Comment on lines 178 to 179

		As well, you can check the results on the Katib UI.
		The trial statuses on the experiment monitor page looks as follows:

Contributor

8bitmp3 Nov 11, 2020

Suggested change

      
            As well, you can check the results on the Katib UI.
          
            The trial statuses on the experiment monitor page looks as follows:
          
            In addition, you can check your results on the Katib UI.
          
            The trial statuses on the experiment monitor page should look as follows:

8bitmp3 suggested changes

View reviewed changes

Contributor

8bitmp3 left a comment •

edited

Loading

Thanks @andreyvelich 💯 !

AFAIK, early stopping helps with resources and execution when the (validation) loss or some other target metric no longer improves. Let's add that here to accommodate for the users who are new to ML or aren't as proficient in ML as the others.

...Early stopping allows you to avoid overfitting when you
train your model during Katib experiments. It also helps by saving computing
resources and reducing experiment execution time by stopping the experiment's trials
when the target metric(s) no longer improves before the training process is complete.

Notice the use of "the target metric(s)"

LMKWYT

Cheers


          Address comments

dfcfb43

andreyvelich commented

View reviewed changes

Member Author

andreyvelich left a comment

Thank you for the review @8bitmp3.
I've made changes.

Contributor

8bitmp3 commented Nov 12, 2020

/lgtm

/assign @animeshsingh @Bobgy

PTAL and /approve or suggest changes. Thanks!

k8s-ci-robot assigned animeshsingh, Bobgy and 8bitmp3

k8s-ci-robot added the lgtm label

Member Author

andreyvelich commented Nov 13, 2020

Thanks @8bitmp3!
/approve

Member Author

andreyvelich commented Nov 13, 2020

This PR has changes in /docs/images so I can't /approve it.
@animeshsingh @Bobgy @joeliedtke Can you with help with approval please ?

Contributor

Bobgy commented Nov 13, 2020

@andreyvelich can you make a sub folder for katib in the images folder and add katib owners there?

We can merge this first
/lgtm
/approve

Contributor

k8s-ci-robot commented Nov 13, 2020

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andreyvelich, Bobgy

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [Bobgy]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot added the approved label

k8s-ci-robot merged commit 450dc73 into kubeflow:master

Member Author

andreyvelich commented Nov 13, 2020

Sure, should we put images to /docs/images/katib
Or it's better to put them directly to the components folder: /docs/components/katib/images ?
What do you think @Bobgy @8bitmp3 @RFMVasconcelos ?

Contributor

Bobgy commented Nov 13, 2020

docs/components/katib/images will be better if that's feasible, but I feel like the doc website doesn't support it

Can you have a try?

Member Author

andreyvelich commented Nov 13, 2020

docs/components/katib/images will be better if that's feasible, but I feel like the doc website doesn't support it

Can you have a try?

I'll try.

andreyvelich mentioned this pull request

Move images under Katib directory #2354

Merged

andreyvelich deleted the add-early-stopping-doc branch

October 3, 2021 00:53

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved lgtm size/L