Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: turn healthCheckHttpClient timeout from 500ms to 3s #1321

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

batleforc
Copy link

@batleforc batleforc commented Sep 16, 2024

What does this PR do?

It change the timeout of the healthcheck client from 500ms to 3s

What issues does this PR fix or reference?

Linked to eclipse-che/che#23067

Is it tested? How?

In progress

PR Checklist

  • E2E tests pass (when PR is ready, comment /test v8-devworkspace-operator-e2e, v8-che-happy-path to trigger)
    • v8-devworkspace-operator-e2e: DevWorkspace e2e test
    • v8-che-happy-path: Happy path for verification integration with Che

Signed-off-by: Max batleforc <maxleriche.60@gmail.com>
Copy link

openshift-ci bot commented Sep 16, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: batleforc
Once this PR has been reviewed and has the lgtm label, please assign aobuchow for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link

openshift-ci bot commented Sep 16, 2024

Hi @batleforc. Thanks for your PR.

I'm waiting for a devfile member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@batleforc
Copy link
Author

Setup in a working environment, with the linked PR work

Copy link
Collaborator

@AObuchow AObuchow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@batleforc Thank you for the PR :)

I assume you're submitting this PR with the hopes of getting it into upstream DWO (and Che) rather than just making changes for your own testing, correct?

Rather than change the default timeout as part of your PR, my gut instinct is we should instead expose a configuration option in the DevWorkspaceOperatorConfig that would allow users to customize the healthCheckHttpClient timeout. Then you could configure the timeout to your desired value from there.

If you're okay with reworking your PR to do this, let me know and I can help guide you further.

@@ -70,7 +70,7 @@ func setupHttpClients(k8s client.Client, logger logr.Logger) {
}
healthCheckHttpClient = &http.Client{
Transport: healthCheckTransport,
Timeout: 500 * time.Millisecond,
Timeout: 3 * 500 * time.Millisecond,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would actually be 1500 ms (1.5s) instead of 3s.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without a doubt a checkout of a stash to high, the test that I deployed was set to a hard coded 3s

@AObuchow
Copy link
Collaborator

/ok-to-test

@batleforc
Copy link
Author

HI @AObuchow,
This PR is part of an issue in the Che Side where I have a problem with a slow CNI.
Your gut instinct are the same as mine, the end goal would be to have it merged with a possibility to set this value but i was conflicted on how to have it match between the Che Operator and the DevWorkspace Operator.
I'm totally okay on reworking this PR and if you have time I'm waiting for your guidance.

@dkwon17
Copy link
Collaborator

dkwon17 commented Sep 30, 2024

Instead of increasing the timeout, what about returning a RetryError when the health check fails here , and handle it with checkDWError? This is so that another reconcile request would be created if the attempt to ping the health endpoint fails.

@batleforc does that work for your use case?

We do something similar when waiting for the workspace deployment to be ready:

return &dwerrors.RetryError{Message: "Deployment is not ready"}

if shouldReturn, reconcileResult, reconcileErr := r.checkDWError(workspace, err, "Error creating DevWorkspace deployment", metrics.DetermineProvisioningFailureReason(err.Error()), reqLogger, &reconcileStatus); shouldReturn {
reqLogger.Info("Waiting on deployment to be ready")
reconcileStatus.setConditionFalse(conditions.DeploymentReady, "Waiting for workspace deployment")
return reconcileResult, reconcileErr
}

@batleforc
Copy link
Author

I think that it could fix the problem, totally answer my case @dkwon17, and could remove the need of changing the Che operator source code.
What I really need is to not wait for Five more minutes when the IDE is already up but the CNI took too long to broadcast the IP of the Pod (it annoys me and kind of irritate the user).

@batleforc
Copy link
Author

batleforc commented Sep 30, 2024

And your answer wouldn't lock the operator on this action and potentially unlock the process for future action

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants