Skip to content

Commit

Permalink
Proposal for handling late-initialization in ACK.
Browse files Browse the repository at this point in the history
  • Loading branch information
vijtrip2 committed Jun 28, 2021
1 parent 7bc7bb2 commit 892971e
Show file tree
Hide file tree
Showing 3 changed files with 278 additions and 0 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
<mxfile host="Electron" modified="2021-06-23T20:57:36.076Z" agent="5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) draw.io/12.9.9 Chrome/80.0.3987.163 Electron/8.2.1 Safari/537.36" etag="88_X_xZvr2moexoWVV37" version="12.9.9" type="device"><diagram name="Page-1" id="e7e014a7-5840-1c2e-5031-d8a46d1fe8dd">7Vxbc6M2FP41nmkf4kEIcXm0nezutNvpTtLObh8JKLa6GCjIcdxfXwkkAxKOWcfYNHYmk6CDEOIcfecqGMHZ8uVj5qeL35IQRyPTCF9G8HZkmo5rs7+csCkJyDVKwjwjYUkCFeGB/IsFUXZbkRDnjY40SSJK0iYxSOIYB7RB87MsWTe7PSVR866pP8ca4SHwI536lYR0IajA9qoTnzCZL8StXVM88KMffJ9nySoW9xuZ8Kn4KU8vfTmWeNB84YfJukaCdyM4y5KElkfLlxmOOGsl28rrPuw4u513hmPa5YJfbsB6Yjt/W+7kc7iZm5NPd96NJYZ59qMVls9RzJZuJIfYxFN+uFpGn8kTjkjMWlN2Y0oCkvrF3dm5u5gSuuFncEaWmOKM0SNxwZeKNl0vCMUPqR/wQddsUTHagi4j1gLskAma+uySbNuOIj/NyWMxH4NRMhysspw843ucl+uJU58xn5AfTSIyjxmNJnzgnN2HxPM/eOMW2owS+Y84mm6FN0uiJCseVIoPTpMV5ZOebZccH1/nt+QduzF+qZEE/z/ihD1ytmFd5FlHrAWBFSia62rhWYK0qC05JPv5Yq3PtyNX8mYHQuQ/In5TE/+HaPXCKLNbbR2wh2QTmvqCvwHjQyHPJyawB9ENtAhiScKQn5Sy+Iyf+IPBinIvnrUg0Sz5jqVU4qRYbFlCfepXKyBNSEwLZqAp+2XsmRljNEJsmjPWBlWb/fLuGWXCZEOzhcVvhP2crnFOjyRW1BSrpUnVbZGq1CPHFyq8Yvp0mIaK8KHdDdNylRxf/JYmfkyD8ArnzhKFTYl6Z4azfYXz6eCMLNSEM+oIZ9ST+FEH8ddE04TSrSJrBpxFMk9iP6pLW2fb6wvxBwxjg5eOzkvQwksA+uKl7bwbXppmN2Z6vfHS3c9LpgOecSi4ieNwwuM1bmsiP89J0FQq+IXQb/x4bIrWXxzWY9OzRfv2ReC8aGxqjZoEdiiCV6WQJ6sswB0WDg4bwaQurJo0UJuaELQMRz5l2rGhpNpEJO7wha/Eai0oOmrbliOUzyMuqoeEyjim0xzHUcahfjbHVBuHidHf1LoJnHSeLkKvzspGrbOqFmt5/2rpbvl/+Gp29UDo/6oZoHduNWvpanYy+5UHlcxvyJivwPhx9Ua7OiRmU7xn9kalBnsNKPs0/U6u7NXDMl1VKqbXOsJBKWzgNkMKAJSMTleNbSmxCbCVgY6ksrdOrjLhXpUw0pORYPzr6pE5+PxaP02jzS7FUVterSB/IlGkkDRds1O3tIVDRWRSODdHijy2iWIJdD05CIw2qPeVHEQdbOLBUB8IMKGSkQUqN7sCEyFvjF4fagc0j4YePe1njgsOsXA9IExi7xY6ljE06Og5OI3715zNscTvKEbx/DkbpIl/8vVhF/6uPq8mUKDgWZPnaX3eDmmjgw3hXldW6pKBWEykWky1ONnVYjqqi6kO1Le91BNYkNvLWYZ9yoylUUCWsUtEJe/WetqK9dyayrr5bFOg/ZlPr0fA7Q0yBwY4qMSO5qGxo4bcjrHjsQAnb1eTqlUB7l7g7PcVTVf0gtBmnRttdtseIDuipTsRN+Rg/7Pi+5Wm1eanm6Bk94Sv9PnjT0V2l/sP7G/9GIGfq8vZ0Zz/R1z+f6ZhqXDvt6rWkBN4zGRXn11uPKQ4aD37WJxlPs0q56xkfGD+bkhoErd2D3j3SRxzL4jwPmUHdln5xGWv97sI1YBpqwrOtwiPkGyQFaQbYwwbJSTHsY5QQupsP+xB2Q9keWNHKakYB5oQ6Bhjc89YfVsRPc1hX26SsG0L4WlTHXaHVEd/wdHQ8vyeIh50KNCgmk7UhuobZ3oOw7nQdKLZZh1PC7IOm4DeWyp+m+N5eypeG6pv7LTFr6fxqN3Co85JPG94uHl5SVArwBsh93aWJMbcZaYLnHHHm/BGnPC+Cz+eF6S4PD+qJ0MuzGFGrrLJoi1HclKVINFyLTGcZJf3wCoMjh6zfyS0iKHTZBcMr4WGrViHUVhwjhD07uTCft95WPZfeY/CO9D6A7V2dFrT7+jhqcFNconOdJUv3q+RtJucd85uIvUQRmO+sIklx+ULoOB1dkeFGtyrLUvLdQO66k9VVRbm7kuSkyJFCW+zsm/b5tN+y7FKzgG2hENOm950+5KrHg15KsaMGK8bDquRF7njC8EePD/4+iyaD9NkWYemVYGrRqzaUH2bLb0YDowLSqsCVZLw3PDpswQ+EPio2VLLhIfhB9oaftShesaPNH91/IBLSZeq4DHPDR63w1ct9hYTYzaTb4JLRaOsJQLDkoSqmFi0NvXW8cqJssSxN56T7zANBN3QRmqt0LIOBDhCllrD1MbqG+F6YAfMxvaFS9kvhlxL1bZWC+ItJHudBvN91iEHAil1B5dlH4gnx9IMpjpU33DSY3MAKzhdym4wpLpAbXFb24rqD0cXUGp0rOaLz9ahyUYNkCfONrp6kA2soeype6+YVbcHoLPvl3Y7fIuh6+Y5uR+u9HY9G41Ot3NuYB4skq/USUFbyhDd41N7bKtjMeZ6p1UWekrhWjM84GtvCvpbsuEnLSJ6eurh+g2/N3/D7/xi1XMY18+4veUzbueXqKlJ9PpZlIMDJ3No0tXzQ9d3ft/wzm+PAmXN6rPLpXdVfdoa3v0H</diagram></mxfile>
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
277 changes: 277 additions & 0 deletions docs/design/proposals/late-initialization/late-initialization.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,277 @@
# ACK Handling Server Side Defaults

## Problem

During reconciliation, ACK service controller consumes the custom resource spec (i.e. desired state) from “etcd” and
then creates/updates the AWS resource to match that desired state. ACK service controller uses the custom resource spec
fields to construct Create/Update AWS API request.

However, some AWS API’s behavior does not completely rely on the fields present in Create/Update AWS API request alone.
In the backend, these AWS APIs either add new fields with default values or update the value of field present in request
with newly calculated value which represents the latest state of the resource in AWS. AWS APIs return these default or
updated values with Create/Update API output and Read/List/Describe API output after resource creation.

There are two main problems arise due to this behavior of AWS APIs,

a) When an AWS API does not strictly use the property value as is in the backend, but updates it, the desired spec of
the K8s resource in the “etcd” does not correctly represent the actual state of resource in AWS. This is a problem because
for K8s user, the spec of custom resource is the source of truth for state of K8s resource.

b) When a default field is populated during AWS resource creation, then during reconciliation, there will always be delta
present between desired and latest K8s representation of the custom resource and an Update AWS API call be made during
reconciliation. This update call may or may not have any side-effects on AWS resource and currently service teams either
place custom code to avoid this OR let the calls happen if it does not alter state of the resource.

> Problem (b) can be viewed as a variation of Problem (a), where server changes the property value from “null” to something.
This is worth noting because solving Problem (a) in a generic and consistent manner will provide the solution for
Problem (b) as well.


## Solutions

To compare the solutions for above problem, we will use the following facets:

1. Consistency with K8s conventions
2. Consistency across ACK service controllers
3. GitOps compatibility
4. User experience for custom resource lifecycle

✅ → doable with no/some efforts

❌ → not achievable

### Solution 1

In this solution, the status fields will include all the fields which are CreateOutput shape members. Current status
fields are the CreateOutput shape members which are not present in CreateInput shape (spec fields).

During reconciliation, the Read(One|Many),Update request inputs will be created from desired state but giving precedence
to status fields over spec in case of conflict.


1. *Consistency with K8s conventions*
1. In K8s convention, the spec fields are not duplicated into status fields. This will confuse k8s users when they
describe their custom resources.
2. *Consistency across ACK service controllers*
1. There should be no complexity in following this approach across all ACK service controllers.
3. *GitOps compatibility*
1. There should be no complexity in implementing GitOps for ACK custom resources.
2. Approach mentioned in low level design below should work.
4. *User experience for custom resource lifecycle*
1. Applying updates to custom resource can become confusing for the user.
2. To update custom resource spec, K8s user will have to check resource status because status will have most updated
value of all the fields.

Other Cons :

* The number of status fields will be very large and there will be a lot of duplication in custom resource description


### Solution 2

This solution builds on top of “Solution 1”, with the assumption that “Problem” mentioned above is not very common
amongst AWS APIs.

In this solution, service teams can specify the spec fields which will be duplicated in both spec and status of custom
resource using generator.yaml


1. *Consistency with K8s conventions*
1. In K8s convention, the spec fields are not duplicated into status fields. This will confuse k8s users when they
describe their custom resources.
2. *Consistency across ACK service controllers*
1. Since all AWS APIs do not have this “Problem”, behavior across different service controllers will not seem consistent.
3. *GitOps compatibility*
1. There should be no complexity in implementing GitOps for ACK custom resources.
2. Approach mentioned in low level design below should work.
4. *User experience for custom resource lifecycle*
1. Applying updates to custom resource can become confusing for the user.
2. To update custom resource spec, K8s user will have to check resource status because status will have most updated
value for some of the fields.



### Solution 3 (Preferred Solution)

In this solution, If a default value is added or existing value is updated as part of Create/Update API output, ACK service controller will update the spec/status of the custom resource accordingly. These updates will be persisted in custom resource’s Annotation to refer in future reconciliations.


1. *Consistency with K8s conventions*
1. In K8s convention, the spec fields are not duplicated into status fields.
2. However we will be updating the desired state of custom resource from ACK service controller, which is not really the recommended k8s way.
2. *Consistency across ACK service controllers*
1. All ACK service controllers will be consistent. Spec fields in each custom resource will represent the desired state of the cluster.
3. *GitOps compatibility*
1. There should be no complexity in implementing GitOps for ACK custom resources.
2. Approach mentioned in low level design below should work.
4. *User experience for custom resource lifecycle*
1. Applying updates to custom resource will not be confusing for the user. Users will update the spec and service controller will create Update request based on new spec.


> I personally prefer “Solution 3” provided it gives a consistent GitOps behavior.
> The biggest selling point for “Solution 3” is that it is consistent with the K8s convention where desired state is
> defined by spec of the custom resource and also custom resource spec in “etcd” is consistent with AWS resource.
> To make the Update resource user experience better, In addition to the Status’s condition, we will also add annotation
> capturing the changes which were made to desired spec. [More details in Low Level Design]

## High Level Design

![High Level Design](./images/late_initialization.png)
[source](./images/late_initialization.drawio)


## POC

I did a small POC to determine how GitOps(Fluxcd) behaves when desired spec is updated by the controller.

Setup:

* A basic “Guestbook” [operator](https://github.com/vijtrip2/guestbook/blob/main/controllers/guestbook_controller.go) from
kubebuilder getting started guide.
* “Guestbook” [manifest](https://github.com/vijtrip2/flux-get-started/blob/master/guestbook/guestbook-sample.yaml) being
deployed using fluxcd.
* Controller adds a defaultValue in spec on each run, updates an existing value in spec and updates the annotation on each
reconcile loop.

Observations:

* During the start of each reconciliation, the spec and status are reset to values present in the git repository.
* Annotations are not reset, if a value is updated in annotations, it is present for future reconciliations.
* Changes made after reconciliation are not reflected back into the git repository. “Fluxcd” does not support this.

The Low level design below is determined from the POC done above.

## Low Level Design

To implement the flow described in high level design, we will walk through reconciliation of an example resource through
following steps and layout in detail how these steps will be achieved by ACK controller.
We assume that Fluxcd is used to apply desired state of custom resource.

1. Assume there is a custom resource with following spec and status fields.

“fieldA” and “fieldD” are typical spec and status field respectively. “fieldA” is provided by K8s user and “fieldD”
is updated in status based on latest observation.

“defaultFieldB” if not provided by K8s user, gets updated to “valueB” by reconciler.

overrideFieldC‘s is modified by reconciler whenever provided by the k8s user. Ex: “-xyz” gets appended to it’s value.

```yaml
Spec:
fieldA string
defaultFieldB string
overrideFieldC string
Status:
fieldD string
```
and the manifest file(desired state) looks like,
```yaml
Spec:
fieldA: valueA
overrideFieldC: valueC
```
2. When this resource is reconciled by the ACK service controller, we expect the CreateOutput to have following value
for these fields. fieldA → “valueA“, defaultFieldB → “valueB“, overrideFieldC → “valueC-xyz“, fieldD → ”valueD“
Based on this result, ACK service controller will persist the following object in etcd:
Annotations:
```yaml
ack/spec-overrides:
“{
\“defaultFieldB\”:
{
\“serverOverride\”:\“valueB\”
},
\“overrideFieldC\”:
{
\“userInput\”:\“valueC\”,
\“serverOverride\”:\“valueC-xyz\”
}
}”
Spec:
fieldA: valueA
defaultFieldB: valueB
overrideFieldC: valueC-xyz
Status:
fieldD: valueD
```
> Note: the annotation above has single string value. It is been broken into multiple lines to showcase the structure better.
> Note: Fields whose value are defaulted, do not contain “userInput” and that is how they are distinguished.
> Note: The change will be made in the sdkCreate and sdkUpdate methods, in the code-generator repository. Whenever there is an update required for spec field, based on the Create/Update output shape, we will record it in the Annotations, so that during next reconciliation, this context is preserved.
> Note: sdkCreate will marshall the spec field and use string comparison to determine whether spec needs update or not. This will prevent from runtime panic of incomparable fields while using “==” or DeepEqual() method.
3. During the next reconciliation of manifest mentioned in #1, the desired state will be constructed not just from spec
in etcd but also the Annotations. and the desired object will look like the object persisted in etcd in #2
After reading the desired state from etcd, the reconciler code will go through all the spec fields and see whether
they exist in the spec-override annotation map. If a corresponding entry exists, reconciler will marshall current
desired value of spec field and compare with “userInput” in annotation override,
* If there is no difference between “userInput” and marshalled desiredValue, reconciler will update the desired state
with serverOverride from spec-override annotation.
* If the marshalled string does not match “userInput”, it means there was some change in manifest by the user, and
serverOverride will from spec-override map will not be used for desired state.
In this case, There will be no delta between desired and latest object, And no AWS updates will be triggered.
> Note: The change will be made again in code-generator, inside “ResourceFromRuntimeObject” method of resourceDescriptor
> to take into account the Annotations when constructing the spec.
4. Now let’s assume that User updates the manifest and provides the value for both “defaultFieldB” and “overrideFieldC”.
The manifest will look like
```yaml
Spec:
fieldA: valueA
defaultFieldB: userValueB
overrideFieldC: userValueC
```
Now, during reconciliation, when the desired object is being constructed by parsing Annotations, reconciler will see
that defaultFieldB is not absent and remove the annotation for defaultFieldB. Reconciler will do the same for “overrideFieldC”
since userValueC is not equal to userInput i.e. “valueC”
5. At this time, the desired object looks different from latest object in AWS and an Update will happen. The result of
UpdateOutput will contain following fields, fieldA → “valueA”, defaultFieldB → “userValueB” and overrideFieldC → “userValueC-xyz”
During sdkUpdate, Annotations will be updated again but only for the overrideFieldC since that is only field different
from current state in AWS.
The object persisted in etcd will look like
```yaml
Annotations:
ack/spec-overrides:
“{
\“overrideFieldC\”:
{
\“userInput\”:\“userValueC\”,
\“serverOverride\”:\“userValueC-xyz\”
}
}”
Spec:
fieldA: valueA
defaultFieldB: userValueB
overrideFieldC: userValueC-xyz
Status:
fieldD: valueD
```
And that is how reconciler will handle simple case of default values and server overrides of desired state.
## Discussion Points
* Whenever “ack/spec-overrides” Annotation is present, should we also update the status to have a condition that indicated that ACK controller has overriden some spec fields?

0 comments on commit 892971e

Please sign in to comment.