Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache the error from the last asynchronous reconciliation #391

Merged
merged 1 commit into from
Apr 25, 2024

Conversation

ulucinar
Copy link
Collaborator

@ulucinar ulucinar commented Apr 24, 2024

Description of your changes

Related issue: crossplane-contrib/provider-upjet-aws#1164, crossplane-contrib/provider-upjet-azure#684

This PR proposes a set of changes to:

  • Cache the error from the last asynchronous reconciliation to return it in the next asynchronous reconciliation for the Terraform plugin SDK & framework based external clients.
  • Set the "Synced" status condition to "False" in the async CallbackFn to immediately update it when the async operation fails.
  • Set the "Synced" status condition to "True" when the async operation succeeds, or when the external client's Observe call reveals an up-to-date external resource which is not scheduled for deletion.

These changes result in the Synced status condition to being set to False when the asynchronously executing reconciliation operation encounters an error. Previously, we used to set a LastAsyncOperation to signal an error that the last asynchronous operation has encountered but we did not set the Synced status condition to False, which is a violation of the XRM contract.

The same issue exists also for the Terraform CLI-based asynchronous external client but we will address that client in a follow-up PR.

I have:

  • Read and followed Upjet's contribution process.
  • Run make reviewable to ensure this PR is ready for review.
  • Added backport release-x.y labels to auto-backport this PR if necessary.

How has this code been tested

Tested by updating a Policy.iam resource with an invalid policy document as described in crossplane-contrib/provider-upjet-aws#1164. Without the proposed changes here, here's a log of the registered Synced status condition values after each reconciliation when a bad policy document is specified. Initially, the policy document is a valid one:

True
True
True
True
True
True
True
True
True
True
True
True
True

Here, the first reconciliation is done on a valid document, and the remaining ones are done on an invalid document (the ones on invalid document are exponentially backed-off).

And with the proposed changes, here are the logs for the same scenario:

True
True
False
False
False
False
False
False
False
False
False
False
False
False

So, after the asynchronous reconciliation encounters an error, the Synced condition is set to False and it's stable, i.e., it does not oscillate between the True & False states as long as the desired policy document stays invalid (with a bad syntax). When it's set to a valid document, the Synced condition is stably set to True (stays at the state True).

And here's the full status.conditions in error state:

status:
...
  conditions:
  - lastTransitionTime: "2024-04-24T20:39:29Z"
    reason: Available
    status: "True"
    type: Ready
  - lastTransitionTime: "2024-04-24T21:19:43Z"
    message: "update failed: async update failed: failed to update the resource: [{0
      updating IAM Policy (arn:aws:iam::...:policy/test-policy): MalformedPolicyDocument:
      Syntax errors in policy.\n\tstatus code: 400, request id: 761851df-687a-4c66-984f-d0cd7b212b96
      \ []}]"
    reason: ReconcileError
    status: "False"
    type: Synced
  - lastTransitionTime: "2024-04-24T21:19:43Z"
    message: "async update failed: failed to update the resource: [{0 updating IAM
      Policy (arn:aws:iam::...:policy/test-policy): MalformedPolicyDocument:
      Syntax errors in policy.\n\tstatus code: 400, request id: 761851df-687a-4c66-984f-d0cd7b212b96
      \ []}]"
    reason: AsyncUpdateFailure
    status: "False"
    type: LastAsyncOperation

A test similar to the one described above was conducted for SecurityGroupIngressRule resource, using the example manifest.

@ulucinar ulucinar marked this pull request as draft April 24, 2024 20:33
…t in

the next asynchronous reconciliation for the Terraform plugin SDK &
framework based external clients.

- Set the "Synced" status condition to "False" in the async CallbackFn
  to immediately update it when the async operation fails.
- Set the "Synced" status condition to "True" when the async operation
  succeeds, or when the external client's Observe call reveals an
  up-to-date external resource which is not scheduled for deletion.

Signed-off-by: Alper Rifat Ulucinar <ulucinar@users.noreply.github.com>
@ulucinar ulucinar marked this pull request as ready for review April 24, 2024 21:26
Comment on lines +32 to +34
errXPReconcileCreate = "create failed"
errXPReconcileUpdate = "update failed"
errXPReconcileDelete = "delete failed"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've copied the error messages from crossplane-runtime to prevent flip-flops in the error messages (when the managed reconciler sets the error and when the upjet runtime sets it).

Copy link
Member

@sergenyalcin sergenyalcin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ulucinar LGTM!

@ulucinar ulucinar merged commit 577bfa7 into crossplane:main Apr 25, 2024
6 checks passed
@ulucinar ulucinar deleted the fix-sync-state branch April 25, 2024 11:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants