feat: turn healthCheckHttpClient timeout from 500ms to 3s #1321

batleforc · 2024-09-16T20:57:00Z

What does this PR do?

It change the timeout of the healthcheck client from 500ms to 3s

What issues does this PR fix or reference?

Linked to eclipse-che/che#23067

Is it tested? How?

In progress

PR Checklist

E2E tests pass (when PR is ready, comment /test v8-devworkspace-operator-e2e, v8-che-happy-path to trigger)
- v8-devworkspace-operator-e2e: DevWorkspace e2e test
- v8-che-happy-path: Happy path for verification integration with Che

Signed-off-by: Max batleforc <maxleriche.60@gmail.com>

openshift-ci · 2024-09-16T20:57:06Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: batleforc
Once this PR has been reviewed and has the lgtm label, please assign aobuchow for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci · 2024-09-16T20:57:11Z

Hi @batleforc. Thanks for your PR.

I'm waiting for a devfile member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

batleforc · 2024-09-16T21:23:39Z

Setup in a working environment, with the linked PR work

AObuchow

@batleforc Thank you for the PR :)

I assume you're submitting this PR with the hopes of getting it into upstream DWO (and Che) rather than just making changes for your own testing, correct?

Rather than change the default timeout as part of your PR, my gut instinct is we should instead expose a configuration option in the DevWorkspaceOperatorConfig that would allow users to customize the healthCheckHttpClient timeout. Then you could configure the timeout to your desired value from there.

If you're okay with reworking your PR to do this, let me know and I can help guide you further.

AObuchow · 2024-09-24T20:55:58Z

controllers/workspace/http.go

@@ -70,7 +70,7 @@ func setupHttpClients(k8s client.Client, logger logr.Logger) {
 	}
 	healthCheckHttpClient = &http.Client{
 		Transport: healthCheckTransport,
-		Timeout:   500 * time.Millisecond,
+		Timeout:   3 * 500 * time.Millisecond,


This would actually be 1500 ms (1.5s) instead of 3s.

Without a doubt a checkout of a stash to high, the test that I deployed was set to a hard coded 3s

AObuchow · 2024-09-24T20:58:33Z

/ok-to-test

batleforc · 2024-09-24T21:18:08Z

HI @AObuchow,
This PR is part of an issue in the Che Side where I have a problem with a slow CNI.
Your gut instinct are the same as mine, the end goal would be to have it merged with a possibility to set this value but i was conflicted on how to have it match between the Che Operator and the DevWorkspace Operator.
I'm totally okay on reworking this PR and if you have time I'm waiting for your guidance.

dkwon17 · 2024-09-30T19:54:40Z

Instead of increasing the timeout, what about returning a RetryError when the health check fails here , and handle it with checkDWError? This is so that another reconcile request would be created if the attempt to ping the health endpoint fails.

@batleforc does that work for your use case?

We do something similar when waiting for the workspace deployment to be ready:

devworkspace-operator/pkg/provision/workspace/deployment.go

Line 91 in 0055cb6

return &dwerrors.RetryError{Message: "Deployment is not ready"}

devworkspace-operator/controllers/workspace/devworkspace_controller.go

Lines 485 to 489 in 0055cb6

    
           if shouldReturn, reconcileResult, reconcileErr := r.checkDWError(workspace, err, "Error creating DevWorkspace deployment", metrics.DetermineProvisioningFailureReason(err.Error()), reqLogger, &reconcileStatus); shouldReturn { 
        
           	reqLogger.Info("Waiting on deployment to be ready") 
        
           	reconcileStatus.setConditionFalse(conditions.DeploymentReady, "Waiting for workspace deployment") 
        
           	return reconcileResult, reconcileErr 
        
           }

batleforc · 2024-09-30T20:27:23Z

I think that it could fix the problem, totally answer my case @dkwon17, and could remove the need of changing the Che operator source code.
What I really need is to not wait for Five more minutes when the IDE is already up but the CNI took too long to broadcast the IP of the Pod (it annoys me and kind of irritate the user).

batleforc · 2024-09-30T20:28:12Z

And your answer wouldn't lock the operator on this action and potentially unlock the process for future action

feat: turn healthCheckHttpClient timeout from 500ms to 3s

8e30852

Signed-off-by: Max batleforc <maxleriche.60@gmail.com>

batleforc requested review from AObuchow, dkwon17 and ibuziuk as code owners September 16, 2024 20:57

openshift-ci bot added the needs-ok-to-test label Sep 16, 2024

AObuchow reviewed Sep 24, 2024

View reviewed changes

openshift-ci bot added ok-to-test and removed needs-ok-to-test labels Sep 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: turn healthCheckHttpClient timeout from 500ms to 3s #1321

feat: turn healthCheckHttpClient timeout from 500ms to 3s #1321

batleforc commented Sep 16, 2024 •

edited

Loading

openshift-ci bot commented Sep 16, 2024

openshift-ci bot commented Sep 16, 2024

batleforc commented Sep 16, 2024

AObuchow left a comment

AObuchow Sep 24, 2024

batleforc Sep 24, 2024

AObuchow commented Sep 24, 2024

batleforc commented Sep 24, 2024

dkwon17 commented Sep 30, 2024

batleforc commented Sep 30, 2024

batleforc commented Sep 30, 2024 •

edited

Loading

feat: turn healthCheckHttpClient timeout from 500ms to 3s #1321

Are you sure you want to change the base?

feat: turn healthCheckHttpClient timeout from 500ms to 3s #1321

Conversation

batleforc commented Sep 16, 2024 • edited Loading

What does this PR do?

What issues does this PR fix or reference?

Is it tested? How?

PR Checklist

openshift-ci bot commented Sep 16, 2024

openshift-ci bot commented Sep 16, 2024

batleforc commented Sep 16, 2024

AObuchow left a comment

Choose a reason for hiding this comment

AObuchow Sep 24, 2024

Choose a reason for hiding this comment

batleforc Sep 24, 2024

Choose a reason for hiding this comment

AObuchow commented Sep 24, 2024

batleforc commented Sep 24, 2024

dkwon17 commented Sep 30, 2024

batleforc commented Sep 30, 2024

batleforc commented Sep 30, 2024 • edited Loading

batleforc commented Sep 16, 2024 •

edited

Loading

batleforc commented Sep 30, 2024 •

edited

Loading