checkpointer: make grace period a flag. #813

diegs · 2017-12-20T03:13:44Z

This allows users to configure the checkpoint removal grace period
parameter as a flag. This may be useful for tests, among other things.

diegs · 2017-12-20T04:18:42Z

coreosbot run e2e checkpointer

squat

LGTM

xiang90 · 2017-12-20T18:30:30Z

among other things.

what are other things?

I do not feel a flag should be added solely for testing. it is better to create a unit test if we want to get rid of the dependencies of timing.

diegs · 2017-12-20T18:40:49Z

@xiang90 it's a tunable parameter. I don't know what the optimal value is. Adding a flag helps for experimentation.

xiang90 · 2017-12-20T20:25:44Z

pkg/checkpoint/checkpoint.go

@@ -45,6 +44,9 @@ type Options struct {
 	RemoteRuntimeEndpoint string
 	// RuntimeRequestTimeout is the timeout that is used for requests to the RemoteRuntimeEndpoint.
 	RuntimeRequestTimeout time.Duration
+	// CheckpointGracePeriod is the timeout ithat is used for cleaning up checkpoints when the parent


ithat -> that

This allows users to configure the checkpoint removal grace period parameter as a flag. This may be useful for tests, among other things.

xiang90 · 2017-12-20T20:30:15Z

it's a tunable parameter. I don't know what the optimal value is. Adding a flag helps for experimentation.

To be honest, I would rather have less tunable parameters, and good default values. Also it is not clear to me that why do we have this gracePeriod at the first place by reading the code. Maybe it is for dealing with a delete then create case, so that the deleted pod will still run for a while until the created one comes up?

diegs · 2017-12-20T20:39:44Z

@xiang90 #755 is the PR where it is added. You can see the linked issues for why. This was required for us to ship 1.8. Self-hosted did not work without the change.

xiang90 · 2017-12-20T20:56:01Z

If I understand it correctly setting graceperiod depends on how fast the kubelet will start the replaced pod after the API server changes the state of the pod. What would be the side effect to set it long enough for 99% of the cases, say 5 minutes instead of 1 minutes for real world cases? And what is the plan to do the experiments you mentioned? What I want to avoid is to add a flag, and either someone misuses it or no one uses it.

aaronlevy · 2017-12-20T22:01:15Z

In the future I might expect there to be more of these types of configurables (like don't start checkpoint if it's older than X - to protect against stale nodes rejoining with really old workloads), or even control these types of things per-pod via annotations.

So to move this forward, @xiang90 would opening an issue to discuss experimenting with the default grace period, and maybe adding a note on the flag that it's alpha / could be removed satisfy your concerns?

xiang90 · 2017-12-20T22:48:03Z

So to move this forward, @xiang90 would opening an issue to discuss experimenting with the default grace period, and maybe adding a note on the flag that it's alpha / could be removed satisfy your concerns?

will do.

xiang90 · 2017-12-20T22:48:40Z

and maybe adding a note on the flag that it's alpha / could be removed satisfy your concerns?

that would be great.

aaronlevy · 2017-12-21T22:56:28Z

Thinking about this a bit more, changing a flag from --alpha-foo to --foo is just as much friction to us and users as eventually deprecating foo if we move to something like annotations (or decide that there is one true grace period).

So for now, let's just merge this as is - I'm not concerned about this as a configurable for now.

Feel free to add some info into the flag description if you like, but won't block on it

aaronlevy

lgtm

xiang90 · 2018-01-02T21:13:48Z

@aaronlevy Shall we get this merged? I will go ahead and create an issue for evaluating the default timeout and potentially remove the flag once we have high confidence after this PR gets merged.

Thanks.

diegs self-assigned this Dec 20, 2017

diegs requested review from rphillips and squat December 20, 2017 03:13

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Dec 20, 2017

diegs force-pushed the cp-flag branch from dc51b27 to 5cc19b5 Compare December 20, 2017 04:18

squat approved these changes Dec 20, 2017

View reviewed changes

xiang90 reviewed Dec 20, 2017

View reviewed changes

checkpointer: make grace period a flag.

586fc8c

This allows users to configure the checkpoint removal grace period parameter as a flag. This may be useful for tests, among other things.

diegs force-pushed the cp-flag branch from 5cc19b5 to 586fc8c Compare December 20, 2017 20:27

aaronlevy approved these changes Dec 21, 2017

View reviewed changes

diegs merged commit e6fdc45 into kubernetes-retired:master Jan 2, 2018

diegs deleted the cp-flag branch January 2, 2018 23:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

checkpointer: make grace period a flag. #813

checkpointer: make grace period a flag. #813

diegs commented Dec 20, 2017

diegs commented Dec 20, 2017

squat left a comment

xiang90 commented Dec 20, 2017

diegs commented Dec 20, 2017

xiang90 Dec 20, 2017

xiang90 commented Dec 20, 2017

diegs commented Dec 20, 2017

xiang90 commented Dec 20, 2017

aaronlevy commented Dec 20, 2017

xiang90 commented Dec 20, 2017

xiang90 commented Dec 20, 2017

aaronlevy commented Dec 21, 2017

aaronlevy left a comment

xiang90 commented Jan 2, 2018

checkpointer: make grace period a flag. #813

checkpointer: make grace period a flag. #813

Conversation

diegs commented Dec 20, 2017

diegs commented Dec 20, 2017

squat left a comment

Choose a reason for hiding this comment

xiang90 commented Dec 20, 2017

diegs commented Dec 20, 2017

xiang90 Dec 20, 2017

Choose a reason for hiding this comment

xiang90 commented Dec 20, 2017

diegs commented Dec 20, 2017

xiang90 commented Dec 20, 2017

aaronlevy commented Dec 20, 2017

xiang90 commented Dec 20, 2017

xiang90 commented Dec 20, 2017

aaronlevy commented Dec 21, 2017

aaronlevy left a comment

Choose a reason for hiding this comment

xiang90 commented Jan 2, 2018