Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design: Dynamic registration of admission controllers and initializers #611

Merged
Merged
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,13 @@ details.

## Goals

* Admins are able to predict what initializers/webhooks will be applied to newly
* Admin is able to predict what initializers/webhooks will be applied to newly
created objects.

* Admin needs to be able to ensure initializers/webhooks config will be applied within some bound

* As a fallback, admin can always restart an apiserver and guarantee it sees the latest config

* Do not block the entire cluster if the intializers/webhooks are not ready
after registration.

Expand Down Expand Up @@ -112,29 +116,82 @@ const (
// AdmissionHookClientConfig contains the information to make a TLS
// connection with the webhook
type AdmissionHookClientConfig struct {
// Address of the external admission hook, could be a host string,
// a host:port pair, or a URL.
Address string
// Service is a reference to the service for this webhook. It must communicate
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say-- if multiple ports are exposed, it will use 443 if available and be an error condition otherwise. If only a single port is exposed it can just use that.

// on port 443
Service ServiceReference
// CABundle is a PEM encoded CA bundle which will be used to validate webhook's server certificate.
CABundle []byte
}

// ServiceReference holds a reference to Service.legacy.k8s.io
type ServiceReference struct {
// Namespace is the namespace of the service
Namespace string
// Name is the name of the service
Name string
}
```

## Synchronization of AdmissionControlConfiguration (**optional for alpha implement**)
## Synchronization of AdmissionControlConfiguration

If the `initializer admission controller` and the `generic webhook admission
controller` watch the `AdmissionControlConfiguration` and act upon deltas, their
cached version of the configuration might be arbitrarily delayed. This makes it
impossible to predict what initializer/hooks will be applied to newly created
objects.

We considered a few ways to make the behavior of the `initializer admission
We propose two ways to make the behavior of the `initializer admission
controller` and the `generic webhook admission controller` predictable.

(I prefer #2. #1 is inefficient, #3 requires complex schema and is not intuitive)
#### 1. Do consistent read of AdmissionControlConfiguration periodically
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wojtek-t how do you like this approach suggested by @smarterclayton ? It should be much lighter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's start out with this option. Seems way easier. It is not super important that these things get applied instantly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@caesarxuchao - I'm completely fine with this option.
I would even go step further, and do:

  • let's do consistent read every 1s
  • if there wasn't any successful read over last 5s (or 10s) then block everythin

1qps will not be visible and this will give us some tolerance on unsuccessful reads (which may happen from time to time).

WDYT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wojtek-t it sounds good. I'll incorporate it to the doc.


The `initializer admission controller` and the `generic webhook admission
controller` do a consistent read of the AdmissionControlConfiguration either 30s
after the last read, or when there is a request that needs the two controllers to
apply the configuration, whichever comes later.

If the read fails, the two admission controllers block all incoming request.

#### 2. Always do a consistent read of a smaller object

#### 1. Always do consistent read
A consistent read of the AdmissionControlConfiguration object is expensive, we
cannot do it for every incoming request.

Alternatively, we record the resource version of the AdmissionControlConfiguration
in a configmap. The apiserver that handles an update of the AdmissionControlConfiguration
updates the configmap with the updated resource version. In the HA setup, there
are multiple apiservers that update this configmap, they should only
update if the recorded resource version is lower than the local one.

The `initializer admission controller` and the `generic webhook admission
controller` do a consistent read of the configmap *everytime* before applying
the configuration to an incoming request. If the configmap has changed, then
they do a consistent read of the `AdmissionControlConfiguration`.

## What if an initializer controller/webhook is not ready after registered? (**optional for alpha implement**)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is unclear to me-- is the first step going to create things fail-open or fail-closed?

(I think we should do fail-open for 1.7 and add fail-closed behavior later when this feature grows up)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as it is explicitly specified that it is fail open


This will block the entire cluster. We have a few options:

1. only allow initializers/webhooks to be created as "fail open". This could be
enforced via validation. They can upgrade themselves to "fail closed" via the
normal Update operation. A human can also update them to "fail closed" later.

2. less preferred: add readiness check to initializer and webhooks, `initializer
admission controller` and `generic webhook admission controller` only apply
those have passed readiness check. Specifically, we add `readiness` fields to
`AdmissionControllerConfiguration`; then we either create yet another
controller to probe for the readiness and update the
`AdmissionControllerConfiguration`, or ask each initializer/webhook to update
their readiness in the `AdmissionControllerConfigure`. The former is complex.
The latter is essentially the same as the first approach, except that we need
to introduce the additional concept of "readiness".


## Considered bug REJECTED synchronization mechinism:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/bug/but/


#### Rejected 1. Always do consistent read

Rejected because of inefficiency.

The `initializer admission controller` and the `generic webhook admission
controller` always do consistent read of the `AdmissionControlConfiguration`
Expand All @@ -143,16 +200,10 @@ every CREATE request. Because the two admission controllers are in the same
process as the apiserver, the latency mainly consists of the consistent read
latency of the backend storage (etcd), and the proto unmarshalling.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fairly expensive today.

Thinking about this... watch from within the same etcd server is effectively free. We need to guarantee that the clock time for any given ACC is within some bound (because an admission controller can be arbitrarily delayed between when the data is read from etcd and when it applies the rules, so a hard boundary is probably too hard).

The scenario of greatest concern is when an apiservers watch client is delayed relative to the mutation operations being executed, perhaps due to a network hang. If we had a single etcd watch on a set of key ranges, our minimum bound is:

The most recent admission control configuration delivered on the same watch channel as any returned event on that watch channel must be no older than X seconds after the last update operation on this server was performed, otherwise a consistent read must be performed.

That can degrade to just performing a consistent read from each api server every X seconds and delaying / rejecting requests until the read returns.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Said another way, our etcd client simply has to ensure that the open watch channel from the server that also watches ACC has delivered ANY event within the time window we care about.

Copy link
Contributor

@smarterclayton smarterclayton May 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if the etcd client and server preserve the guarantee that would be necessary for the optimization there though. The client has to construct a watch, and the etcd server has to guarantee a "delivered before" behavior over the connection (event on ACC key has to be delivered before another key with higher index). gRPC substreams (one for each watch) might not support that if two watches are in different sub streams because the packets for the one stream can be delayed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or the api server write to the endpoints object has to track the member observed high etcd index and delay writes if other members of the cluster have higher observed indices (if the watch cache value is higher than the read value of the GET on endpoints)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of like the upper bound and the degrading thing. Will make e2e tests slow though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So from first principles:

  1. admin must be able to know when config is applied
  2. admin can always restart an apiserver and guarantee it sees the latest config (fallback)
  3. admin needs to be able to ensure config eventually applies within some bound

Those are good goals for an alpha api. To just do 3, could we:

  1. do a consistent read every 30s
  2. if the consistent read fails, block the admission plugin (fail) until it succeeds

Tests would have to get clever about verifying the config is working. If we're concerned, we can simply add a watch on the config objects and always use the latest, but the consistent read must succeed every 30s. That would allow tests to work most of the time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is 2 an explicit goal? That sounds very easy to achieve.

Sure. I'll add this option to the doc, and let's see if we are on the same page.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just mean as - it's the thing that they have to do today, and it's worked so far. So it's ok to say "you must restart your masters to guarantee the latest config applied", which is probably reasonable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smarterclayton - got it. Thanks a lot for explanation. That makes perfect sense to me.
I added a potential improvement for it in #611 (comment)


#### 2. Optimized version of #1, do consistent read of a smaller object

Instead of having the two controllers do consistent read of the entire
`AdmissionControlConfiguration` object, we let the registry store the
resourceVersion of the `AdmissionControlConfiguration` (perhaps in a configMap),
and let the two controllers always do consistent read of the resourceVersion and
only read the entire `AdmissionControlConfiguration` if the local version is
lower than the stored one.
#### Rejected 2. Don't synchronize, but report what is the cached version

#### 3. Don't synchronize, but report what is the cached version
Rejected because it violates Goal 2 on the time bound.

The main goal is *NOT* to always apply the latest
`AdmissionControlConfiguration`, but to make it predictable what
Expand All @@ -165,50 +216,3 @@ observedGeneration and predict if all the initializer/hooks listed in the
In the HA setup, the `observedGeneration` reported by of every apiserver's
`initializer admission controller` and `generic webhook admission controller`
are different, so the API needs to record multiple `observedGeneration`.

A tentative schema:

```golang
Type AdmissionControlConfiguration struct {
...
// Generation is set by the registry
Geneartion int64
// ObserverdGenerations is set by `initializer admission controller` and
// `generic webhook admission controller` in each apiserver.
ObserverdGenerations []ObservedGenerationByServer
}

type ObservedGenerationByServer struct {
// Address of this server
// This can be a hostname, hostname:port, IP or IP:port
Server string
// The entity that reports the observedGeneration
AdmissionController AdmissionControllerType
ObservedGeneration int64
}

type AdmissionControllerType string

const (
InitializerAdmissionController AdmissionControllerType = "initializer admission controller"
GenericWebhookAdmissionController AdmissionControllerType = "generic webhook admission controller"
)
```

## What if an initializer controller/webhook is not ready after registered? (**optional for alpha implement**)

This will block the entire cluster. We have a few options:

1. only allow initializers/webhooks to be created as "fail open". They can
upgrade themselves to "fail closed" via the normal Update operation. A human
can also update them to "fail closed" later.

2. less preferred: add readiness check to initializer and webhooks, `initializer
admission controller` and `generic webhook admission controller` only apply
those have passed readiness check. Specifically, we add `readiness` fields to
`AdmissionControllerConfiguration`; then we either create yet another
controller to probe for the readiness and update the
`AdmissionControllerConfiguration`, or ask each initializer/webhook to update
their readiness in the `AdmissionControllerConfigure`. The former is complex.
The latter is essentially the same as the first approach, except that we need
to introduce the additional concept of "readiness".