-
Notifications
You must be signed in to change notification settings - Fork 508
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unstable OpenShift builds after bumping Kubernetes Client to 6.4 #2024
Comments
/cc @shawkins for awareness |
Is this the debug issue? |
I'm out of context now, can't recall what the debug issue is. If you mean the one about the logs, no it's not the same. In this one we're getting some sort of timeouts when sending the binary file to OpenShift. I'm preparing a demo for tomorrow that requires other things from our SNAPSHOT and couldn't further investigate the issue 😓 I'll probably spend some time on this on Monday. |
The TimeoutException stacktrace:
|
It seems that the OperationSupport#waitForResult call is using the global config instead of the one provided by the context at BuildConfigOperationsImpl#submitToApiServer. Probably introduced by fabric8io/kubernetes-client#4678 |
@manusa there are too many paths for timeout handling... there was a change to the OperationSupport to enforce the readTimeout client side as okhttp in particular was prone to enqueing a request and not enforcing the timeout while enqueued. Rather than updating the httpclient to expose it's config, given the other changes we've made, it would make more sense to move the enforcement of this out of OperationSupport and into the Standard httpclient logic handling of sendAsync. |
I'm working on the fix, it really doesn't make sense that the |
It's no different than the other paths that call sendAsync - it's a blocking operation that is expected to complete within some timeout. The issue is that waitForResult is referencing the default timeout value - and that's a problem we could have with any of the other code paths (none of which are currently affected) if we don't move (or remove) the OperationSupport enforcement. You could make the case for removal if we believe that the changes to the okhttp threading defaults are sufficient for addressing this and that it's unlikely to be a problem with the other httpclients. |
I'm not sure there is a specific timeout value in this case. What should we default this value to? This is an OpenShift S2i build, I'm not aware of any restrictions in terms of timeout for these scenarios. And I would rather not put myself in the position of imposing one arbitrary value. IMO the timeout should happen server-side. |
It defaults to 0, but is settable by the user here https://github.com/fabric8io/kubernetes-client/blob/1d3e263cc6216f4a3adad6b677539186179f41c2/openshift-client/src/main/java/io/fabric8/openshift/client/dsl/internal/build/BuildConfigOperationsImpl.java#L244 It's then set on the httpclient https://github.com/fabric8io/kubernetes-client/blob/1d3e263cc6216f4a3adad6b677539186179f41c2/openshift-client/src/main/java/io/fabric8/openshift/client/dsl/internal/build/BuildConfigOperationsImpl.java#L260 and in the following line. Setting the readTimeout overlaps with the behavior of RequestConfig.getRequestTimeout.
Just to make sure we're on the same page, all the places that we're manipulating the httpclient reade/write timeouts the expectation it that it will be enforced by the httpclient. In particular the write timeout is not enforced by JDK, it is enforced as the sum of read and write timeouts by jetty (which may confuse the meaning of either being a 0/indefinite value), and as an idle write timeout on vertx (that seems incorrect at first glance). Given the state of things, I'd definitely be in favor of getting rid of this write timeout from the HttpClient. I'm not sure about how consistently the readtimeout is treated - we know it won't be enforced by okhttp while the request is enqueued. |
OK, thought that these values were not applicable from this scenario. Then
This statement refers exclusively to the S2I binary build scenario_, more specifically to the case where the user doesn't provide a timeout._ Meaning that our client-side read/write timeouts should default to 0 -run indefinitely-.
I'm not even sure what I think that to make things consistent
+1 |
Prior to optional the enforcement of the timeout waitForResult was simply the central method to make calls blocking so that the same exception handling did not end up in every blocking location.
Prior to adding the waitForResult check, which more closely matches the full request cycle, it was treated as the default read timeout. |
Describe the bug
After the recent upgrade to Kubernetes Client 6.4.0 (#1988) we're getting random errors when using the OpenShift Build goal/task:
Eclipse JKube version
1.11-SNAPSHOT
Component
OpenShift Maven Plugin
Steps to reproduce
Perform OpenShift Builds with SNAPSHOT version, especially for big images.
Expected behavior
Builds should complete successfully
Tasks
The text was updated successfully, but these errors were encountered: