Skip to content
This repository has been archived by the owner on Jul 30, 2021. It is now read-only.

checkpointer: make grace period a flag. #813

Merged
merged 1 commit into from
Jan 2, 2018

Conversation

diegs
Copy link
Contributor

@diegs diegs commented Dec 20, 2017

This allows users to configure the checkpoint removal grace period
parameter as a flag. This may be useful for tests, among other things.

@diegs diegs self-assigned this Dec 20, 2017
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Dec 20, 2017
@diegs
Copy link
Contributor Author

diegs commented Dec 20, 2017

coreosbot run e2e checkpointer

Copy link
Contributor

@squat squat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@xiang90
Copy link
Contributor

xiang90 commented Dec 20, 2017

among other things.

what are other things?

I do not feel a flag should be added solely for testing. it is better to create a unit test if we want to get rid of the dependencies of timing.

@diegs
Copy link
Contributor Author

diegs commented Dec 20, 2017

@xiang90 it's a tunable parameter. I don't know what the optimal value is. Adding a flag helps for experimentation.

@@ -45,6 +44,9 @@ type Options struct {
RemoteRuntimeEndpoint string
// RuntimeRequestTimeout is the timeout that is used for requests to the RemoteRuntimeEndpoint.
RuntimeRequestTimeout time.Duration
// CheckpointGracePeriod is the timeout ithat is used for cleaning up checkpoints when the parent
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ithat -> that

This allows users to configure the checkpoint removal grace period
parameter as a flag. This may be useful for tests, among other things.
@xiang90
Copy link
Contributor

xiang90 commented Dec 20, 2017

it's a tunable parameter. I don't know what the optimal value is. Adding a flag helps for experimentation.

To be honest, I would rather have less tunable parameters, and good default values. Also it is not clear to me that why do we have this gracePeriod at the first place by reading the code. Maybe it is for dealing with a delete then create case, so that the deleted pod will still run for a while until the created one comes up?

@diegs
Copy link
Contributor Author

diegs commented Dec 20, 2017

@xiang90 #755 is the PR where it is added. You can see the linked issues for why. This was required for us to ship 1.8. Self-hosted did not work without the change.

@xiang90
Copy link
Contributor

xiang90 commented Dec 20, 2017

If I understand it correctly setting graceperiod depends on how fast the kubelet will start the replaced pod after the API server changes the state of the pod. What would be the side effect to set it long enough for 99% of the cases, say 5 minutes instead of 1 minutes for real world cases? And what is the plan to do the experiments you mentioned? What I want to avoid is to add a flag, and either someone misuses it or no one uses it.

@aaronlevy
Copy link
Contributor

In the future I might expect there to be more of these types of configurables (like don't start checkpoint if it's older than X - to protect against stale nodes rejoining with really old workloads), or even control these types of things per-pod via annotations.

So to move this forward, @xiang90 would opening an issue to discuss experimenting with the default grace period, and maybe adding a note on the flag that it's alpha / could be removed satisfy your concerns?

@xiang90
Copy link
Contributor

xiang90 commented Dec 20, 2017

So to move this forward, @xiang90 would opening an issue to discuss experimenting with the default grace period, and maybe adding a note on the flag that it's alpha / could be removed satisfy your concerns?

will do.

@xiang90
Copy link
Contributor

xiang90 commented Dec 20, 2017

and maybe adding a note on the flag that it's alpha / could be removed satisfy your concerns?

that would be great.

@aaronlevy
Copy link
Contributor

Thinking about this a bit more, changing a flag from --alpha-foo to --foo is just as much friction to us and users as eventually deprecating foo if we move to something like annotations (or decide that there is one true grace period).

So for now, let's just merge this as is - I'm not concerned about this as a configurable for now.

Feel free to add some info into the flag description if you like, but won't block on it

Copy link
Contributor

@aaronlevy aaronlevy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@xiang90
Copy link
Contributor

xiang90 commented Jan 2, 2018

@aaronlevy Shall we get this merged? I will go ahead and create an issue for evaluating the default timeout and potentially remove the flag once we have high confidence after this PR gets merged.

Thanks.

@diegs diegs merged commit e6fdc45 into kubernetes-retired:master Jan 2, 2018
@diegs diegs deleted the cp-flag branch January 2, 2018 23:02
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants