Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HttpClient leak messages #3533

Closed
travispeloton opened this issue Dec 6, 2024 · 10 comments
Closed

HttpClient leak messages #3533

travispeloton opened this issue Dec 6, 2024 · 10 comments
Assignees
Labels
for/stackoverflow Questions are best asked on SO or Gitter

Comments

@travispeloton
Copy link

travispeloton commented Dec 6, 2024

A couple of our projects frequently log LEAK messages, and they have references to the HttpClient in most cases. A sample of the stacktraces are listed below, but we are unsure how to proceed in the investigation.

Your Environment

  • Kotlin 2.x
  • Spring WebFlux 6.2.0
  • reactor-netty-core 1.2.0
  • reactor-core 3.7.0
  • netty 4.1.115.Final
  • JVM version (java -version): 21
  • OS and version (eg. uname -a): eclipse-temurin:21-jdk
@travispeloton travispeloton added status/need-triage A new issue that still need to be evaluated as a whole type/bug A general bug labels Dec 6, 2024
@violetagg
Copy link
Member

@travispeloton Please provide at least the code which shows how you use HttpClient. Please enable and provided the extended logging as described here https://projectreactor.io/docs/netty/release/reference/appendices.html#faq.memory-leaks

@violetagg violetagg self-assigned this Dec 9, 2024
@violetagg violetagg added for/user-attention This issue needs user attention (feedback, rework, etc...) and removed status/need-triage A new issue that still need to be evaluated as a whole labels Dec 9, 2024
@travispeloton
Copy link
Author

@violetagg Extended logging has already been enabled. You may need to scroll down in the attached document to see stacktraces with that extended logging.

@violetagg
Copy link
Member

Ooo right, sorry! ... and the code that shows how you use it? Because the logs show that the data was sent to the consumer?

#2:
	Hint: [ef19db85-10, L:/10.11.29.192:57854 - R:publish-api-internal.comms.prod.k8s.pelotime.com/10.11.6.208:443] Receiver reactor.core.publisher.FluxMap$MapSubscriber will handle the message from this point
	io.netty.handler.codec.http.DefaultHttpContent.touch(DefaultHttpContent.java:86)
	io.netty.handler.codec.http.DefaultLastHttpContent.touch(DefaultLastHttpContent.java:123)
	io.netty.handler.codec.http.DefaultLastHttpContent.touch(DefaultLastHttpContent.java:30)
	reactor.netty.channel.FluxReceive.onInboundNext(FluxReceive.java:375)

@travispeloton
Copy link
Author

travispeloton commented Dec 9, 2024

We use the following HttpClient configuration,

        // https://projectreactor.io/docs/netty/snapshot/reference/index.html
        var httpClient = HttpClient.create(
            ConnectionProvider.builder(name)
                .maxConnections(connectionMaxCount)
                .maxIdleTime(Duration.ofMillis(connectionRandomMaxIdleTimeMillis))
                .maxLifeTime(Duration.ofMillis(connectionRandomMaxLifeTimeMillis))
                .pendingAcquireMaxCount(connectionMaxAwaitRequests)
                .pendingAcquireTimeout(Duration.ofMillis(connectionAcquireTimeoutMillis))
                .evictInBackground(Duration.ofMillis(connectionEvictionTimeoutMillis))
                // use lifo leasing strategy, because we are using keep-alive timeouts below
                .lifo()
                .build()
        )
            .option(ChannelOption.CONNECT_TIMEOUT_MILLIS, connectTimeoutMillis)
            .option(ChannelOption.SO_KEEPALIVE, true)
            .option(ChannelOption.TCP_NODELAY, true)
            .responseTimeout(Duration.ofMillis(responseTimeoutMillis))
            .metrics(metricsEnabled, Function { it })
            .resolver {
                // require the use of fully-qualified-domain-names to minimize hostname resolution latency
                it.searchDomains(emptyList())
                    .build()
            }

            httpClient = httpClient
                // first keep-alive probe after X seconds of being idle
                .option(EpollChannelOption.TCP_KEEPIDLE, 60)
                // repeat probe every Y seconds
                .option(EpollChannelOption.TCP_KEEPINTVL, 20)
                // maximum number of failed probes before the connection is dropped
                .option(EpollChannelOption.TCP_KEEPCNT, 5)

        return WebClient.builder()
            .clientConnector(ReactorClientHttpConnector(httpClient))
            .defaultHeader(HttpHeaders.USER_AGENT, userAgent)
            .codecs { it.defaultCodecs().maxInMemorySize(maxResponsePayloadBytes) }
            .build()

And provide the following retry policy,

            Retry.backoff(maxAttempts, minBackoff)
                .filter {
                    if (it is ServiceUnavailable) {
                        true
                    } else {
                        when (it.cause) {
                            is ReadTimeoutException -> {
                                true
                            }

                            is UnknownHostException -> {
                                true
                            }

                            else -> false
                        }
                    }
                }.onRetryExhaustedThrow(fun(_: RetryBackoffSpec, retrySignal: Retry.RetrySignal): Throwable {
                    throw retrySignal.failure()
                })

An example request using org.springframework.web.reactive.function.client.WebClient,

            webClient.post()
                .uri(uri)
                .accept(MediaType.APPLICATION_JSON)
                .header(HttpHeaders.AUTHORIZATION, authorizationHeaderValue)
                .bodyValue(Request())
                .retrieve()
                .bodyToMono(Response::class.java)
                .retryWhen(
                    buildRetryPolicy(
                        uri,
                        responseTimeoutRetryCount,
                        responseTimeoutRetryBackoff,
                    )
                )
                .awaitSingle()

@travispeloton
Copy link
Author

I'm realizing that for that URI in the logs that we are doing the following,

.bodyToMono(DataBuffer::class.java)

However, we're not consuming the response. I remember reading that there is a way to properly handle this situation. Trying to find an example.

@violetagg
Copy link
Member

@travispeloton That's the issue - you are holding the buffer and never release it! You can use toBodilessEntity
https://docs.spring.io/spring-framework/docs/current/javadoc-api/org/springframework/web/reactive/function/client/WebClient.ResponseSpec.html

@violetagg violetagg added for/stackoverflow Questions are best asked on SO or Gitter and removed type/bug A general bug labels Dec 9, 2024
@travispeloton
Copy link
Author

right, I see that now; let me update my code and see if I'm left with any more leak messages

thanks!

@travispeloton
Copy link
Author

travispeloton commented Dec 9, 2024

@violetagg if we are reading HTTP client metrics (e.g., response status code) via Micrometer, do we always need to use

.retrieve()
.toBodilessEntity()

vs

.awaitExchange { response -> response.releaseBody() }

Question is, if we do the second one, will we not see those response code metrics in Micrometer?

@violetagg
Copy link
Member

@travispeloton
Copy link
Author

^ the above fixed my issue in one of my services. Closing. Thanks @violetagg !

@violetagg violetagg removed the for/user-attention This issue needs user attention (feedback, rework, etc...) label Dec 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
for/stackoverflow Questions are best asked on SO or Gitter
Projects
None yet
Development

No branches or pull requests

2 participants