Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

java.net.SocketTimeoutException for logWatch() #5202

Closed
cmdjulian opened this issue Jun 4, 2023 · 7 comments · Fixed by #5206
Closed

java.net.SocketTimeoutException for logWatch() #5202

cmdjulian opened this issue Jun 4, 2023 · 7 comments · Fixed by #5206
Assignees
Labels
Milestone

Comments

@cmdjulian
Copy link

cmdjulian commented Jun 4, 2023

Describe the bug

In v6.6.2 the following code used to work fine:

private val client = KubernetesClientBuilder()
    .withConfig(ConfigBuilder().withNamespace("default").build())
    .build()

override fun logReader(jobId: UUID): JobLogReader {
    val jobPod = jobPod(jobId)

    // when pod is ready, construct log reader, pod-name=job-name-{some-hash}
    val inputStream = client.pods().withName(jobPod.metadata.name).watchLog().output
    return object : JobLogReader {
        val reader = inputStream.bufferedReader()

        override fun readLine(): String? = reader.readLine()

        override fun close() {
            reader.close()
        }
    }
}

logReader(job.jobId).use { reader ->
        generateSequence(reader::readLine).forEach(::println)
}

Since version 6.7.0 I get an Exception. The Exception always occurs pretty fast, even if the underlying pod is not terminated yet (I used a sleep 3600 to simulate it). The pod is streaming continuous logs.
I also see them being consumed .

Fabric8 Kubernetes Client version

6.7.0

Steps to reproduce

  1. create logWatch()
  2. feed it into a BufferedReader
  3. See error after a few seconds regardless if the pod is done

Expected behavior

Having the old behavior. Up on Pod termination stream should be gracefully closed.

Runtime

other (please specify in additional context)

Kubernetes API Server version

1.25.3@latest

Environment

Linux

Fabric8 Kubernetes Client Logs

java.io.IOException: channel already closed with exception
	at io.fabric8.kubernetes.client.http.HttpClientReadableByteChannel.read(HttpClientReadableByteChannel.java:112) ~[kubernetes-client-api-6.7.0.jar:na]
	at java.base/sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:65) ~[na:na]
	at java.base/sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:107) ~[na:na]
	at java.base/sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:101) ~[na:na]
	at java.base/sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:270) ~[na:na]
	at java.base/sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:313) ~[na:na]
	at java.base/sun.nio.cs.StreamDecoder.read(StreamDecoder.java:188) ~[na:na]
	at java.base/java.io.InputStreamReader.read(InputStreamReader.java:177) ~[na:na]
	at java.base/java.io.BufferedReader.fill(BufferedReader.java:162) ~[na:na]
	at java.base/java.io.BufferedReader.readLine(BufferedReader.java:329) ~[na:na]
	at java.base/java.io.BufferedReader.readLine(BufferedReader.java:396) ~[na:na]
	at de.etalytics.scripts.kubernetes.job.JobServiceOnKubernetesImpl$logReader$1.readLine(JobServiceOnKubernetesImpl.kt:63) ~[main/:na]
	at de.etalytics.scripts.engine.logs.JobLogStreamingService$streamLogs$1$1.invoke(JobLogStreamingService.kt:28) ~[main/:na]
	at de.etalytics.scripts.engine.logs.JobLogStreamingService$streamLogs$1$1.invoke(JobLogStreamingService.kt:28) ~[main/:na]
	at kotlin.sequences.SequencesKt__SequencesKt$generateSequence$1.invoke(Sequences.kt:641) ~[kotlin-stdlib-1.8.21.jar:1.8.21-release-380(1.8.21)]
	at kotlin.sequences.GeneratorSequence$iterator$1.calcNext(Sequences.kt:591) ~[kotlin-stdlib-1.8.21.jar:1.8.21-release-380(1.8.21)]
	at kotlin.sequences.GeneratorSequence$iterator$1.hasNext(Sequences.kt:609) ~[kotlin-stdlib-1.8.21.jar:1.8.21-release-380(1.8.21)]
	at kotlin.sequences.TransformingSequence$iterator$1.hasNext(Sequences.kt:214) ~[kotlin-stdlib-1.8.21.jar:1.8.21-release-380(1.8.21)]
	at de.etalytics.scripts.engine.logs.JobLogStreamingService.streamLogs(JobLogStreamingService.kt:48) ~[main/:na]
	at de.etalytics.scripts.engine.job.JobRunner.streamLogs$lambda$0(JobRunner.kt:43) ~[main/:na]
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[na:na]
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[na:na]
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[na:na]
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[na:na]
	at java.base/java.lang.Thread.run(Thread.java:833) ~[na:na]
Caused by: java.net.SocketTimeoutException: timeout
	at okhttp3.internal.http2.Http2Stream$StreamTimeout.newTimeoutException(Http2Stream.kt:675) ~[okhttp-4.10.0.jar:na]
	at okhttp3.internal.http2.Http2Stream$StreamTimeout.exitAndThrowIfTimedOut(Http2Stream.kt:684) ~[okhttp-4.10.0.jar:na]
	at okhttp3.internal.http2.Http2Stream$FramingSource.read(Http2Stream.kt:379) ~[okhttp-4.10.0.jar:na]
	at okhttp3.internal.connection.Exchange$ResponseBodySource.read(Exchange.kt:276) ~[okhttp-4.10.0.jar:na]
	at okio.RealBufferedSource.exhausted(RealBufferedSource.kt:197) ~[okio-jvm-3.0.0.jar:na]
	at io.fabric8.kubernetes.client.okhttp.OkHttpClientImpl$OkHttpAsyncBody.doConsume(OkHttpClientImpl.java:131) ~[kubernetes-httpclient-okhttp-6.7.0.jar:na]
	... 3 common frames omitted

Additional context

I'm using k3d

@cmdjulian
Copy link
Author

I also tried setting the requestTimeout to 0 in the client builder but this didn't solve my problem either.

@shawkins
Copy link
Contributor

shawkins commented Jun 4, 2023

I also tried setting the requestTimeout to 0 in the client builder but this didn't solve my problem either.

I think the issue is that all of the other clients default their timeouts to 0/unlimited, but okhttp does not. So it's still enforcing it's default of 10 seconds. We'll need to change the OkHttpClientFactory to account for this.

The workaround in the meantime is to do it on your own:

    KubernetesClient otherClient = new KubernetesClientBuilder().withHttpClientFactory(new OkHttpClientFactory() {
      @Override
      protected void additionalConfig(Builder builder) {
        builder.readTimeout(Duration.ZERO);
      }
    }).build();

For okhttp this does affect http watches as well. #4949 also removed where the calling code was setting the readTimeout to 0 for websocket requests - but that does not actually seem to matter to okhttp as the read/write timeout seem to only affect regular http requests.

@cmdjulian
Copy link
Author

Thanks for the fast reply, as the OkHttp dependency is not included with implementation scope, I had to add it first with with implementation("io.fabric8:kubernetes-httpclient-okhttp").

But this seemed to have worked! :)

shawkins added a commit to shawkins/kubernetes-client that referenced this issue Jun 5, 2023
@shawkins shawkins self-assigned this Jun 5, 2023
@shawkins shawkins linked a pull request Jun 5, 2023 that will close this issue
11 tasks
@manusa manusa added the bug label Jun 5, 2023
@manusa manusa added this to the 6.7.1 milestone Jun 5, 2023
shawkins added a commit to shawkins/kubernetes-client that referenced this issue Jun 5, 2023
@manusa
Copy link
Member

manusa commented Jun 5, 2023

The new behavior (#4911) is not working out very well.

I'd really like a way to ensure that long-running operations such as builds, tail logs, and so on have an ensured timeout of 0.

The fixing PR seems to be failing on the e2e tests, probably the no timeout assumption is having collateral unwanted side effects.

@shawkins
Copy link
Contributor

shawkins commented Jun 5, 2023

The new behavior is not working out very well.

That's a bit premature.

I'd really like a way to ensure that long-running operations such as builds, tail logs, and so on have an ensured timeout of 0.

You can also entertain setting it explicitly to 0 for http watches and log watches, but after this fix those cases are covered as well. As mentioned on the issue that was unnecessary for the websocket operations.

The fixing PR seems to be failing on the e2e tests, probably the no timeout assumption is having collateral unwanted side effects.

That was related to the thread pool changes in the tests, not the timeout.

@manusa
Copy link
Member

manusa commented Jun 5, 2023

That was related to the thread pool changes in the tests, not the timeout.

OK, good to know. I hadn't check the PR changes and was worried that just by changing the timeouts everything else went south.

You can also entertain setting it explicitly to 0 for http watches and log watches,

We probably want to do this too, just to make sure that these requests enforce a no-timeout behavior.

As mentioned on the issue that was unnecessary for the websocket operations.

Log watch is HTTP. I was actually dealing with this same issue downstream and about to report it when I saw this one 😅

@manusa
Copy link
Member

manusa commented Jun 5, 2023

We'll need to release a 6.7.1 to fix these problems though

manusa pushed a commit to shawkins/kubernetes-client that referenced this issue Jun 6, 2023
manusa pushed a commit to shawkins/kubernetes-client that referenced this issue Jun 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants