fix(k8s): correctly resolve manifests when `build` is set #4846

vvagaytsev · 2023-07-18T12:10:25Z

What this PR does / why we need it:

Which issue(s) this PR fixes:

Possibly fixes #4505
Fixes #4506
Fixes #4931

Special notes for your reviewer:
There are many commits, but those are incremental ones.
Recommend to review commit-by-commit.

Most part of the code changes is taken by refactoring.

vvagaytsev · 2023-07-18T12:27:11Z

core/src/plugins/kubernetes/kubernetes-type/handlers.ts

+        if (state !== "outdated") {
+          state = resolveResourceStatuses(log, statuses)
+        }


This quickfix avoids resetting the previously calculated "outdated" status to "ready".

That can happen because some k8s resource kinds do not have handlers and return the "ready" status by default. Before fa1bb4d we generated manifests from the action config and called compareDeployedResources.

Now we rely on the status handlers defined in objHandlers, The lack of the status handler for Secret kind of k8s resource caused the regression described in #4811.

Probably, the proper fix would be to implement status handler for Secret k8s resource, see 0ff268c.

That requires some extension of the StatusHandlerParams interface and some extra calculations in the Secret handler because we need to compare the Secret manifest generated from the local config against the deployed resource.

So, it looks like we have to preserve the old scenario to detect secrets update:

Make the manifest out of the action config

Get deployed resource

Compare the manifest against the deployed resource

@edvald @garden-io/core wdyt?

@edvald @thsig any ideas on this? Maybe we should retain the quickfix above, but also introduce the status handler for Secret kind of k8s resources.

@edvald can you take a look at this, please?

So, it looks like we have to preserve the old scenario to detect secrets update:

Make the manifest out of the action config

Get deployed resource

Compare the manifest against the deployed resource

Unless I'm missing some finer points here, it seems to me that a simpler way to preserve the old comparison semantics would be to simply use compareDeployedResources to compare manifestMetadata with the deployed resources.

This way, we wouldn't have to worry about certain resource types (e.g. Secret) returning a misleading ready. Should be robust and simple to implement.

Would also be great to add an integ test case that verifies that updating a secret to change its value does indeed make the Deploy status outdated.

I amended an existing one in 27996da

core/test/integ/src/plugins/kubernetes/kubernetes-type/common.ts

vvagaytsev · 2023-08-16T16:06:51Z

I need a pre-review for this, I'll add some tests tomorrow.

vvagaytsev · 2023-08-21T11:37:14Z

@edvald @thsig can you take a look, please?

thsig

Good stuff!

See my comment about going back to using compareDeployedResources (but passing in the manifests from the metadata ConfigMap for comparing against the deployed resources).

If we think that's a good idea, we could either merge this and implement that approach in a follow-up PR, or do it in this one before merging.

thsig · 2023-08-21T12:50:47Z

core/test/integ/src/plugins/kubernetes/kubernetes-type/handlers.ts

@@ -128,7 +132,143 @@ describe("kubernetes-type handlers", () => {
    }
  })

-  describe("getServiceStatus", () => {
+  describe("getKubernetesDeployStatus", () => {


Let's add a test case for deleteKubernetesDeploy that verifies that the metadata ConfigMap (which we've started deploying as of this PR for status checking purposes) is deleted along with the other resources.

I expect that's already the behaviour, but it's important that the ConfigMap is indeed deleted (otherwise a status check performed after the delete would incorrectly conclude that the other resources are still deployed).

Oh, have no calls to deleteKubernetesDeploy handler in our tests at all. We should definitely add some tests for that handler to ensure that no dirty state is left in the target runtime.

Ah, we're doing some tests over the DeleteDeployTask.

Added a test case in adf847a

thsig · 2023-08-21T13:12:28Z

core/src/plugins/kubernetes/kubernetes-type/handlers.ts

+        if (state !== "outdated") {
+          state = resolveResourceStatuses(log, statuses)
+        }


So, it looks like we have to preserve the old scenario to detect secrets update:

Make the manifest out of the action config

Get deployed resource

Compare the manifest against the deployed resource

Unless I'm missing some finer points here, it seems to me that a simpler way to preserve the old comparison semantics would be to simply use compareDeployedResources to compare manifestMetadata with the deployed resources.

This way, we wouldn't have to worry about certain resource types (e.g. Secret) returning a misleading ready. Should be robust and simple to implement.

edvald · 2023-08-23T16:07:44Z

core/src/plugins/kubernetes/status/status.ts

+    // TODO: compare the `resource` against the manifest generated from the action?
+    return { state: "ready", resource }
+  },
+


I'm not sure we should have this in here. This is going to break certain cases for sure, and I'm unclear on why we need a specific handler for the Secret kind.

This is essentially the same as the default handler used in resolveResourceStatus, which also always returns a ready state.

https://github.com/garden-io/garden/pull/4846/files/c7c125601e326ad10649255994b0e2de0977376c#diff-24f998c4e8f6dfb59d6ab3922dcc7a3ce8a9cdab363234e51f794537b36611f9R219

The objHandlers map looks like it was designed for waiting for the rollout status of a deploy that's in-progress. For that use-case, it makes sense to return ready when it's not clear how to wait for the resource to fully come up.

However, that does mean that ready states can be returned for outdated resources when these handlers are repurposed for status checking.

I didn't closely read the original PR for this, but are we sure we want to move away from compareDeployedResources? In an earlier comment, I suggested simply passing the manifests from the metadata ConfigMap to compareDeployedResources (for comparing against the remotely deployed ones).

Or is that approach not viable for some reason?

@thsig right, the current Secret handler stub behaves exactly the same as the old code.

The quickfix was made to ensure that we'd never fall through the "outdated" state coming from the version hash comparison.

So, if the version hashes differ, we would never reach any objHandlers. It looks like for the kubernetes type we only allow sub-paths for files and kustomize config options. Thus, in theory, we should not get into a case when secrets are stored in some external files that are not taken into account while the version hash calculation.

We relaxed path validation requirements for helm type in #3445, but not for kubernetes type.

So, it should be safe to rely on the quickfix because it should ensure that "outdated" state will never cause any objHandler invocation.

Or am I missing something? Can you see any way to change secrets without the action's version hash change?

I think the TL;DR is that if we have concerns about the implementation, we should write a few more integ test cases to express & verify our expectations for how this logic should handle the various cases we're discussing here.

CC @shumailxyz @vvagaytsev

Actually, that's not necessarily a Secret resource that can reproduce the original issue. The bug was in the ability to overwrite an already calculated outdated state with ready. And the fix above seems to be enough.

We do not need a dedicated objHandler for Secret resource. I removed that commit from the changeset.

Also, I refactored the code in getKubernetesDeployStatus to have less local mutable state and to be more linear and straightforward.

Now I only need to add a test case that will cover the original bug. We have one called should return outdated status when metadata ConfigMap has different version. But, it doesn't hit the code block that examines the remote resources, because in the test we use an empty metadata object in the metadata ConfigMap.

I'm gonna try to change the test data to have something in the config map metadata.

Fixed an existing test in 27996da

stefreak · 2023-08-28T15:53:56Z

Who is working on the necessary changes in the Garden Cloud AEC "pause" feature? This definitely impacts Garden cloud and that should be considered before merging or releasing this. See also #4516 (comment)

Local mode checks were moved to `getForwardablePorts` in #5022.

Function `getForwardablePorts` does not call any potentially unsafe operations. It's not wrapped into try/catch in the other code places.

Split individual resource status retrieval and overall state calculation.

Minimized mutability of the local vars. More straightforward and linear processing flow.

The value has already been calculated as an immutable value.

Got rid of old-fashioned term "service". Replaced it with "deploy".

shumailxyz

Awesome, thanks @vvagaytsev. Let's get this merged 💯

This is a follow-up quickfix for #4846. It reads a specific annotation and checks if it has a special value set by Cloud. This needs to be changed to reply on a more generic and reliable way of k8s resource comparison.

* fix(k8s): handle AEC-paused resources properly This is a follow-up quickfix for #4846. It reads a specific annotation and checks if it has a special value set by Cloud. This needs to be changed to reply on a more generic and reliable way of k8s resource comparison. * refactor(k8s): explicit typing * refactor(k8s): extract local variable To avoid repetitive creation of the same immutable string. * chore: add TODO comment * refactor(k8s): introduce named constant for `manifest-hash` annotation key * refactor(k8s): use helper to construct the annotation key * chore: helper function to check sha256 validity * refactor: move hash-helper function to a dedicated module * fix(k8s): smarter manifest hash checking Consider any non-sha256 value as an outdated marker for k8s resources. This covers the current AEC behavior on the Cloud side. This also allows users to do manual modifications of k8s resources. * test: add integration test * test: fix integration test to restore the valid cluster state * fix: re-create `Regex` object before every `test` call Because `Regex` is a stateful object. * test: use separate module config in integ test To avoid unwanted modifications of the remote resources used by the other tests. * chore: typo fix Co-authored-by: Shumail Mohyuddin <shumailmohyuddin@gmail.com> --------- Co-authored-by: Shumail Mohyuddin <shumailmohyuddin@gmail.com>

@edvald

**What this PR does / why we need it**: For checking the rollout status of workloads we were using the `spec.replicas` and not the `status.replicas`. This means that during a rollout we would not catch the case where there are currently more `replicas` than `updated` replicas. If we compare the `spec.replicas` to `updated` it is possible that both have the same number, even though `updated` has not passed it's readiness probe yet, which is the trigger for the old pods to be terminated. Only once `status.replicas` and `status.updated` match are we truly done. This led to garden not waiting for deployment rollouts in some cases, which could lead to further errors trying to run execs or tests in non-running pods or pods being canceled mid-execution, due to them being terminated as old pods. This is also implemented in this way in kubectl see [here](https://github.com/kubernetes/kubernetes/blob/331ced5606fe32d8beb77c974b28dc12f7795012/staging/src/k8s.io/kubectl/pkg/polymorphichelpers/rollout_status.go#L83). This is an old bug, but it only surfaced recently since #4846 where we removed `compareDeployedResources` from the status checks for kubernetes type actions. **Which issue(s) this PR fixes**: Fixes # **Special notes for your reviewer**:

vvagaytsev force-pushed the fix/build-flag-on-deploy-action branch from cbe6d34 to 0ff268c Compare July 18, 2023 12:14

vvagaytsev commented Jul 18, 2023

View reviewed changes

vvagaytsev changed the title ~~Fix/build flag on deploy action~~ fix(k8s): correctly resolve manifests when build is set Jul 18, 2023

vvagaytsev mentioned this pull request Jul 25, 2023

0.13: [Bug]: Kubernetes manifest change not taken into account during deploy #4884

Closed

vvagaytsev force-pushed the fix/build-flag-on-deploy-action branch from 0ff268c to d370df6 Compare August 14, 2023 10:41

vvagaytsev requested review from edvald and thsig August 14, 2023 12:32

edvald reviewed Aug 14, 2023

View reviewed changes

core/test/integ/src/plugins/kubernetes/kubernetes-type/common.ts Outdated Show resolved Hide resolved

edvald force-pushed the fix/build-flag-on-deploy-action branch from 94d961d to d428b40 Compare August 14, 2023 16:33

vvagaytsev force-pushed the fix/build-flag-on-deploy-action branch 5 times, most recently from b0c532d to 1feed0f Compare August 16, 2023 15:56

vvagaytsev marked this pull request as ready for review August 16, 2023 16:06

vvagaytsev requested review from edvald and a team August 17, 2023 08:12

thsig reviewed Aug 21, 2023

View reviewed changes

vvagaytsev force-pushed the fix/build-flag-on-deploy-action branch 4 times, most recently from 64d9789 to c7c1256 Compare August 23, 2023 15:56

edvald reviewed Aug 23, 2023

View reviewed changes

edvald previously approved these changes Aug 25, 2023

View reviewed changes

thsig added this pull request to the merge queue Aug 28, 2023

thsig removed this pull request from the merge queue due to a manual request Aug 28, 2023

vvagaytsev mentioned this pull request Sep 5, 2023

refactor: move mode checks to getForwardablePorts #5022

Merged

vvagaytsev added 10 commits September 20, 2023 13:53

refactor: unwrap unnecessary else-block

0f60396

chore: variable scoping

7f898d5

refactor: extract helper to check if deploy is outdated

da6a0b8

refactor: move input args check into getForwardablePorts

6b60f03

chore: remove unnecessary code and comments

3ef9725

Local mode checks were moved to `getForwardablePorts` in #5022.

chore: remove unnecessary try/catch

f702c7f

Function `getForwardablePorts` does not call any potentially unsafe operations. It's not wrapped into try/catch in the other code places.

chore: remove unused local var

8bab6c2

refactor: use SRP in status calculation

297f5cd

Split individual resource status retrieval and overall state calculation.

refactor: simplified code

3a1b913

Minimized mutability of the local vars. More straightforward and linear processing flow.

test: fix test setup to cover the original bug

08b7a37

vvagaytsev force-pushed the fix/build-flag-on-deploy-action branch from 0287c83 to 93aa94a Compare September 20, 2023 11:54

vvagaytsev added 3 commits September 20, 2023 16:38

chore: post-merge corrections

30f1435

refactor: remove unnecessary function call

60eee4c

The value has already been calculated as an immutable value.

chore: typos and wording

07c5427

Got rid of old-fashioned term "service". Replaced it with "deploy".

vvagaytsev force-pushed the fix/build-flag-on-deploy-action branch from f4a7c67 to 07c5427 Compare September 20, 2023 14:39

vvagaytsev enabled auto-merge September 20, 2023 15:43

shumailxyz approved these changes Sep 21, 2023

View reviewed changes

vvagaytsev added this pull request to the merge queue Sep 21, 2023

Merged via the queue into main with commit 6c737a9 Sep 21, 2023
2 checks passed

vvagaytsev deleted the fix/build-flag-on-deploy-action branch September 21, 2023 09:26

vvagaytsev mentioned this pull request Sep 21, 2023

fix(k8s): handle AEC-paused resources properly #5122

Merged

stefreak mentioned this pull request Oct 24, 2023

[FEATURE]: Include generateFiles field in Deploy Actions. #4964

Open

thsig mentioned this pull request Nov 2, 2023

fix(core): improve dependency logic for tests #4957

Closed

twelvemo mentioned this pull request Mar 1, 2024

fix(k8s): fix kubernetes workload rollout status check #5794

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(k8s): correctly resolve manifests when `build` is set #4846

fix(k8s): correctly resolve manifests when `build` is set #4846

vvagaytsev commented Jul 18, 2023 •

edited

Loading

vvagaytsev Jul 18, 2023

vvagaytsev Jul 24, 2023 •

edited

Loading

vvagaytsev Aug 14, 2023

thsig Aug 21, 2023

thsig Aug 21, 2023

vvagaytsev Sep 7, 2023

vvagaytsev commented Aug 16, 2023

vvagaytsev commented Aug 21, 2023

thsig left a comment

thsig Aug 21, 2023

vvagaytsev Sep 5, 2023

vvagaytsev Sep 5, 2023

vvagaytsev Sep 6, 2023

thsig Aug 21, 2023

edvald Aug 23, 2023

thsig Aug 23, 2023

vvagaytsev Aug 24, 2023

thsig Aug 28, 2023 •

edited

Loading

vvagaytsev Sep 7, 2023 •

edited

Loading

vvagaytsev Sep 7, 2023

stefreak commented Aug 28, 2023

shumailxyz left a comment

fix(k8s): correctly resolve manifests when build is set #4846

fix(k8s): correctly resolve manifests when build is set #4846

Conversation

vvagaytsev commented Jul 18, 2023 • edited Loading

Choose a reason for hiding this comment

vvagaytsev Jul 24, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vvagaytsev commented Aug 16, 2023

vvagaytsev commented Aug 21, 2023

thsig left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thsig Aug 28, 2023 • edited Loading

Choose a reason for hiding this comment

vvagaytsev Sep 7, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stefreak commented Aug 28, 2023

shumailxyz left a comment

Choose a reason for hiding this comment

fix(k8s): correctly resolve manifests when `build` is set #4846

fix(k8s): correctly resolve manifests when `build` is set #4846

vvagaytsev commented Jul 18, 2023 •

edited

Loading

vvagaytsev Jul 24, 2023 •

edited

Loading

thsig Aug 28, 2023 •

edited

Loading

vvagaytsev Sep 7, 2023 •

edited

Loading