Explicitly Queue a Reconcile Request if a Shared Provider has Expired #241

ulucinar · 2023-07-27T08:59:49Z

Description of your changes

Fixes #192

We currently return an error if a terraform.SharedProvider has already expired and an external client requests service from the expired shared provider. This results in the MR being reconciled by the external client to acquire a Synced: False status condition. This is a temporary situation that's resolved once the expired shared provider is drained and the terraform.SharedProviderScheduler replaces it.

However, until the shared provider is replaced, the MRs entering into the Synced: False state become confusing. Before the handler.EventHandler implementation from #231, we had to return an error to the managed reconciler to requeue a timely reconciliation request. However, we can now explicitly requeue an exponentially backed-off reconciliation request so that the external client will retry to get service from the provider in a timely manner without having to wait for the poll period (default 10 min for the official providers) and without forcing the MR falling into the Synced: False state.

The external client will retry to schedule a shared runner up to 20 times without erroring, exponentially backing-off after each failure. And after this initial 20 retries, it will report the error to the managed reconciler and keep trying.

I have:

Read and followed Crossplane's contribution process.
Run make reviewable to ensure this PR is ready for review.
Added backport release-x.y labels to auto-backport this PR if necessary.

How has this code been tested

Tested via crossplane-contrib/provider-upjet-aws#805.

sergenyalcin

Thanks @ulucinar LGTM. I left a nit comment for further discussions

sergenyalcin · 2023-07-27T16:31:01Z

pkg/controller/external.go

+		return managed.ExternalObservation{
+			ResourceExists:   true,
+			ResourceUpToDate: true,


nit: I do not have a strong opinion, but I think the case of a resource that has never been scheduled and that of a resource that has been scheduled and reconciled several times (at least within the application) can be evaluated differently.

So, we may consider returning an error for a resource that has never been scheduled.

Thank you @sergenyalcin for considering this. I agree with your point but I also feel a bit uncomfortable with continuing to produce unsync'ed MRs even for certain cases since this change is a UX change.

As we have discussed, I also feel uncomfortable for returning a direct success from the external client's Create function to the managed reconciler but the situation is similar to what we do in the async mode of operation, i.e., we already return an immediate success by just starting a goroutine that may very well fail to initiate a creation request for the external resource.

I propose we continue with suppressing the unsync'ed state of MRs and if folks complain about prolonging wait times than we already have a solution for this: If you don't want to wait, please give more memory to the native provider and increase the provider-ttl command-line parameter. Btw, this PR is not expected to prolong any wait times on its own. No change in the wait times when a runner has expired, and we should just be preventing the MRs from being unsync'ed when the runner has expired and new scheduling requests cannot be admitted until the old one is drained and replaced. (One thing regarding these statements is the max delay configured for the rate limiter, if it's higher than the managed reconciler's, then wait times may change, or any other aspect of the rate limiter).

Maybe we will just end up implementing a scheduler that will not deny requests coming to an expired runner but will fork a new one as needed at the cost of running multiple runners in parallel and increasing the memory footprint. Let's try the approach implemented here first as we already have the provider-ttl parameter.

…ng errors - The external client will requeue at most 20 times before reporting an error Signed-off-by: Alper Rifat Ulucinar <ulucinar@users.noreply.github.com>

sergenyalcin

Thanks @ulucinar LGTM!

ulucinar requested a review from sergenyalcin as a code owner July 27, 2023 08:59

ulucinar force-pushed the fix-192 branch from e7343b3 to 4d50ffc Compare July 27, 2023 12:25

sergenyalcin approved these changes Jul 27, 2023

View reviewed changes

ulucinar force-pushed the fix-192 branch 2 times, most recently from 5451bc0 to 0950921 Compare July 31, 2023 18:01

ulucinar force-pushed the fix-192 branch from c7e0735 to 343bf11 Compare August 1, 2023 13:07

ulucinar requested a review from sergenyalcin August 1, 2023 13:49

Use an EventHandler with the controller.external to retry on scheduli…

7a9116f

…ng errors - The external client will requeue at most 20 times before reporting an error Signed-off-by: Alper Rifat Ulucinar <ulucinar@users.noreply.github.com>

ulucinar force-pushed the fix-192 branch from 343bf11 to 7a9116f Compare August 1, 2023 14:12

sergenyalcin approved these changes Aug 1, 2023

View reviewed changes

ulucinar merged commit 06bdecc into crossplane:main Aug 1, 2023
4 checks passed

ulucinar deleted the fix-192 branch August 1, 2023 14:23

This was referenced Aug 1, 2023

Explicitly queue a reconcile request if a shared provider has expired crossplane-contrib/provider-upjet-gcp#346

Merged

Explicitly queue a reconcile request if a shared provider has expired crossplane-contrib/provider-upjet-azure#501

Merged

jeanduplessis mentioned this pull request Aug 4, 2023

Transient error "native provider reuse budget has been exceeded" crossplane-contrib/provider-upjet-gcp#348

Closed

thekaleidoscope mentioned this pull request Aug 7, 2023

0.32.1 Cannot schedule native Terraform provider process: native provider reuse budget has been exceeded crossplane-contrib/provider-upjet-aws#669

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explicitly Queue a Reconcile Request if a Shared Provider has Expired #241

Explicitly Queue a Reconcile Request if a Shared Provider has Expired #241

ulucinar commented Jul 27, 2023 •

edited

Loading

sergenyalcin left a comment

sergenyalcin Jul 27, 2023

ulucinar Jul 28, 2023 •

edited

Loading

sergenyalcin left a comment

Explicitly Queue a Reconcile Request if a Shared Provider has Expired #241

Explicitly Queue a Reconcile Request if a Shared Provider has Expired #241

Conversation

ulucinar commented Jul 27, 2023 • edited Loading

Description of your changes

How has this code been tested

sergenyalcin left a comment

Choose a reason for hiding this comment

sergenyalcin Jul 27, 2023

Choose a reason for hiding this comment

ulucinar Jul 28, 2023 • edited Loading

Choose a reason for hiding this comment

sergenyalcin left a comment

Choose a reason for hiding this comment

ulucinar commented Jul 27, 2023 •

edited

Loading

ulucinar Jul 28, 2023 •

edited

Loading