Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Katib early stopping documentation #2336

Merged
merged 7 commits into from
Nov 13, 2020

Conversation

andreyvelich
Copy link
Member

Blocked by: #2312.
Related: kubeflow/katib#1360.

I added doc of using early stopping in Katib.

/assign @gaocegege @johnugeorge
/cc @8bitmp3 @RFMVasconcelos

@kubeflow-bot
Copy link

This change is Reviewable

@rui-vas
Copy link
Contributor

rui-vas commented Nov 5, 2020

This is very cool @andreyvelich !

@andreyvelich
Copy link
Member Author

@RFMVasconcelos Thank you!
That should be the latest doc PR for the Katib 0.10

@andreyvelich andreyvelich changed the title [WIP] Add Katib early stopping documentation Add Katib early stopping documentation Nov 11, 2020
@andreyvelich
Copy link
Member Author

This PR is ready.
/cc @8bitmp3 @gaocegege @johnugeorge.

@8bitmp3 I capitalise all titles to be consistent with other guides. For example: Notebooks or KFP.
WDYT ?

@@ -0,0 +1,204 @@
+++
title = "Using Early Stopping"
description = "How to use an early stopping in Katib experiments"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
description = "How to use an early stopping in Katib experiments"
description = "How to use early stopping in Katib experiments"

Comment on lines 10 to 13
Katib experiments. Early stopping allows you to avoid overfitting when you
train your model during Katib experiments. It helps you to save computing
resources and experiment execution time by stopping the experiment's trials
before the training process is complete.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK, early stopping helps with resources and execution when the (validation) loss or some other target metric no longer improves. Let's add that here to accommodate for the users who are new to ML or aren't as proficient in ML as the others.

For example:

Suggested change
Katib experiments. Early stopping allows you to avoid overfitting when you
train your model during Katib experiments. It helps you to save computing
resources and experiment execution time by stopping the experiment's trials
before the training process is complete.
Katib experiments. Early stopping allows you to avoid overfitting when you
train your model during Katib experiments. It also helps by saving computing
resources and reducing experiment execution time by stopping the experiment's trials
when the target metric(s) no longer improves before the training process is complete.

Notice the use of "the target metric(s)"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, nice explanation!

resources and experiment execution time by stopping the experiment's trials
before the training process is complete.

The major advantage of using early stopping in Katib, is that you don't
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The major advantage of using early stopping in Katib, is that you don't
The major advantage of using early stopping in Katib is that you don't

The major advantage of using early stopping in Katib, is that you don't
need to modify your
[training container package](/docs/components/katib/experiment/#packaging-your-training-code-in-a-container-image).
All you have to do is to change your experiment YAML file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
All you have to do is to change your experiment YAML file.
All you have to do is make necessary changes in your experiment's YAML file.

because early stopping algorithms need to know the sequence of reported metrics.
Check the
[`MXNet` example](https://github.com/kubeflow/katib/blob/master/examples/v1beta1/mxnet-mnist/mnist.py#L36)
how to add date format to your logs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
how to add date format to your logs.
to learn how to add a date format to your logs.

As a reference, you can use the YAML file of the
[early stopping example](https://github.com/kubeflow/katib/blob/master/examples/v1beta1/early-stopping/median-stop.yaml).

First of all, follow the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
First of all, follow the
1. Follow the

First of all, follow the
[guide](/docs/components/katib/experiment/#configuring-the-experiment)
to configure your Katib experiment.
To apply early stopping for your experiment, specify the `.spec.earlyStopping`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To apply early stopping for your experiment, specify the `.spec.earlyStopping`
2. Next, to apply early stopping for your experiment, specify the `.spec.earlyStopping`

to configure your Katib experiment.
To apply early stopping for your experiment, specify the `.spec.earlyStopping`
parameter, similar to the `.spec.algorithm`. Refer to the
[`EarlyStoppingSpec` type](https://github.com/kubeflow/katib/blob/master/pkg/apis/controller/common/v1beta1/common_types.go#L41-L58)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[`EarlyStoppingSpec` type](https://github.com/kubeflow/katib/blob/master/pkg/apis/controller/common/v1beta1/common_types.go#L41-L58)
[`EarlyStoppingSpec` type](https://github.com/kubeflow/katib/blob/master/pkg/apis/controller/common/v1beta1/common_types.go#L41-L58)
for more information.


- `.earlyStopping.algorithmSettings`- the settings for the early stopping algorithm.

Experiment's suggestion produces new trials. After that, the early stopping
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Experiment's suggestion produces new trials. After that, the early stopping
What happens is your experiment's suggestion produces new trials. After that, the early stopping

or "will produce... will generate..."

### Early stopping algorithms in detail

Here’s a list of the early stopping algorithms available in Katib.
The links lead to descriptions on this page:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The links lead to descriptions on this page:

This may be redundant if self-evident, I think


- [Median Stopping Rule](#median-stopping-rule)

More algorithms are under development. You can add an early stopping algorithm
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
More algorithms are under development. You can add an early stopping algorithm
More algorithms are under development.
You can add an early stopping algorithm

best objective value by step `S` is worse than the median value of the running
averages of all completed trials' objectives reported up to step `S`.

To learn more about it, check [this paper](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/46180.pdf).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To learn more about it, check [this paper](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/46180.pdf).
To learn more about it, check [Google Vizier: A Service for Black-Box Optimization](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/46180.pdf).

Comment on lines 126 to 127
You have to install [jq](https://stedolan.github.io/jq/download/),
to run below commands.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
You have to install [jq](https://stedolan.github.io/jq/download/),
to run below commands.
First, make sure you have [jq](https://stedolan.github.io/jq/download/) installed.

}
```

If you check status for the early stopped trial:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If you check status for the early stopped trial:
Check the status of the early stopped trial by running this command:

kubectl get trial median-stop-2ml8h96d -n <experiment-namespace>
```

You should be able to view `EarlyStopped` status for the trial:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
You should be able to view `EarlyStopped` status for the trial:
and you should be able to view `EarlyStopped` status for the trial:

Comment on lines 178 to 179
As well, you can check the results on the Katib UI.
The trial statuses on the experiment monitor page looks as follows:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
As well, you can check the results on the Katib UI.
The trial statuses on the experiment monitor page looks as follows:
In addition, you can check your results on the Katib UI.
The trial statuses on the experiment monitor page should look as follows:

Copy link
Contributor

@8bitmp3 8bitmp3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @andreyvelich 💯 !

AFAIK, early stopping helps with resources and execution when the (validation) loss or some other target metric no longer improves. Let's add that here to accommodate for the users who are new to ML or aren't as proficient in ML as the others.

...Early stopping allows you to avoid overfitting when you
train your model during Katib experiments. It also helps by saving computing
resources and reducing experiment execution time by stopping the experiment's trials
when the target metric(s) no longer improves before the training process is complete. 

Notice the use of "the target metric(s)"

LMKWYT

Cheers

Copy link
Member Author

@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the review @8bitmp3.
I've made changes.

@8bitmp3
Copy link
Contributor

8bitmp3 commented Nov 12, 2020

/lgtm

/assign @animeshsingh @Bobgy

PTAL and /approve or suggest changes. Thanks!

@andreyvelich
Copy link
Member Author

Thanks @8bitmp3!
/approve

@andreyvelich
Copy link
Member Author

This PR has changes in /docs/images so I can't /approve it.
@animeshsingh @Bobgy @joeliedtke Can you with help with approval please ?

@Bobgy
Copy link
Contributor

Bobgy commented Nov 13, 2020

@andreyvelich can you make a sub folder for katib in the images folder and add katib owners there?

We can merge this first
/lgtm
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andreyvelich, Bobgy

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 450dc73 into kubeflow:master Nov 13, 2020
@andreyvelich
Copy link
Member Author

Sure, should we put images to /docs/images/katib
Or it's better to put them directly to the components folder: /docs/components/katib/images ?
What do you think @Bobgy @8bitmp3 @RFMVasconcelos ?

@Bobgy
Copy link
Contributor

Bobgy commented Nov 13, 2020

docs/components/katib/images will be better if that's feasible, but I feel like the doc website doesn't support it

Can you have a try?

@andreyvelich
Copy link
Member Author

docs/components/katib/images will be better if that's feasible, but I feel like the doc website doesn't support it

Can you have a try?

I'll try.

@andreyvelich andreyvelich deleted the add-early-stopping-doc branch October 3, 2021 00:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants