-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HttpCient fails to recover after multiple ReadTimeoutException #8727
Comments
FYI. Same behavior on Mac OS and Linux. |
I did a variant of this project using the Jetty 11.x HttpClient in async mode, under /load2 controller. Got the same performance but no breakages. |
@yawkat can you investigate? |
Okay from what I can tell this is because the netty channel pool tries to acquire all these connections, and they just get stuck in the queue in SimpleChannelPool. The connection requests are cancelled because of the timeout, but SimpleChannelPool does not handle this. So it takes a while to work through all the timeouts. Unfortunately I don't see a way to hook SimpleChannelPool to fix this issue. There would need to be a listener on the acquire promise that cancels the connectChannel future if the acquire promise is cancelled. But that is all on the netty side. With Micronaut 4 and the changes in #8100 this is fixed. It needs to be reevaluated with #8179 since that PR affects request cancellation, but I think it should still work fine because for this issue the requests are cancelled before filters ever run. |
Thanks. Ouch. I assume no plan to back-port fixes from 4.x to 3.x. |
Unfortunately the changes in 4.0 that fix this issue are pretty wide-reaching and not compatible enough for a minor release. You can try adding a low acquire timeout. It seems that this makes recovery a bit faster. |
using a |
@yawkat and @graemerocher With a high connect timeout, calls to our application fill up heap as calls to http client take so long. This actually (inadvertently) works to our advantage as within a few minutes we run out of memory and the instance is replaced by our cloud provider. I have tried a low timeout (1.5s) as well as circuit breaker. The application no longer crashes BUT the http client does not recover (in a test environment despite waiting 4 days the http client did not recover). So a lack of resilience actually protects us Going forward am going to work with @lordbuddha and use the jetty http client with resilience4j (retry, circuit breaker and time limiter). FYI - While I can replicate this with an insane number of calls in test environment, we have only seen this occur in our production environment twice in the last year. Load on our production system is not at extreme levels (about 60k requests /min distributed to 3 quad core servers) thanks |
Expected Behavior
If HttpClient gets a high number of ReadTimeoutException, then after the underlying service has recovered or the volume of calls has gone down the HttpClient should invoke the underlying service
Actual Behaviour
Once HttpClient experiences a high volume of ReadTimeoutException, it fails to establish connections and serve further requests. To recover the application needs to be restarted
Steps To Reproduce
The sample application has a single endpoint on a controller that accepts the number of concurrent calls to be made to a simple endpoint (get on https://httpstat.us/200)
Clone the repo
Start the application
./gradlew run
Check backend service directly
curl https://httpstat.us/200 -k
Check a single call to backend via HttpClient
curl http://localhost:8080/load
Stress HttpClient (this typically times out in under a minute, but up the attempts if it doesnt)
curl http://localhost:8080/load?attempts=500000
Retry a single attempt as per step 4. Observe that it results in a ReadTimeoutException
Retry step 3 to confirm that direct calls to underlying service as working
Please note that once HttpClient starts failing, no tcp activity to httpstatus.us is observed via wireshark
Logs once it starts failing
3:37:41.658 [default-nioEventLoopGroup-1-16] TRACE i.m.h.server.netty.NettyHttpServer - Server localhost:8080 Received Request: GET /load?attempts=1 13:37:41.658 [default-nioEventLoopGroup-1-16] DEBUG i.m.h.s.netty.RoutingInBoundHandler - Request GET /load?attempts=1 13:37:41.658 [default-nioEventLoopGroup-1-16] TRACE i.m.h.s.netty.RoutingInBoundHandler - Matched route GET - /load to controller class com.example.LoadController 13:37:41.658 [default-nioEventLoopGroup-1-16] TRACE i.m.http.server.cors.CorsFilter - Http Header Origin not present. Proceeding with the request. 13:37:41.658 [default-nioEventLoopGroup-1-16] TRACE i.m.aop.chain.InterceptorChain - Intercepted method [Mono get()] invocation on target: com.example.HttpStatusClient$Intercepted@5c8eee0f 13:37:41.658 [default-nioEventLoopGroup-1-16] TRACE i.m.aop.chain.InterceptorChain - Proceeded to next interceptor [io.micronaut.retry.intercept.RecoveryInterceptor@478ee483] in chain for method invocation: Mono get() 13:37:41.659 [default-nioEventLoopGroup-1-16] TRACE i.m.aop.chain.InterceptorChain - Proceeded to next interceptor [io.micronaut.http.client.interceptor.HttpClientIntroductionAdvice@4189d70b] in chain for method invocation: Mono get() 13:37:41.659 [default-nioEventLoopGroup-1-16] DEBUG i.m.c.e.ApplicationEventPublisher - Publishing event: io.micronaut.http.context.event.HttpRequestReceivedEvent[source=GET /load?attempts=1] 13:37:44.167 [parallel-6] ERROR i.m.http.server.RouteExecutor - Unexpected error occurred: Read Timeout io.micronaut.http.client.exceptions.ReadTimeoutException: Read Timeout at io.micronaut.http.client.exceptions.ReadTimeoutException.<clinit>(ReadTimeoutException.java:26) at io.micronaut.http.client.netty.DefaultHttpClient$BaseHttpResponseHandler.exceptionCaught(DefaultHttpClient.java:2040) at io.micronaut.http.client.netty.DefaultHttpClient$FullHttpResponseHandler.exceptionCaught(DefaultHttpClient.java:2292) at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:346) at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:325) at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:317) at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireExceptionCaught(CombinedChannelDuplexHandler.java:424) at io.netty.channel.ChannelHandlerAdapter.exceptionCaught(ChannelHandlerAdapter.java:92) at io.netty.channel.CombinedChannelDuplexHandler$1.fireExceptionCaught(CombinedChannelDuplexHandler.java:145) at io.netty.channel.ChannelInboundHandlerAdapter.exceptionCaught(ChannelInboundHandlerAdapter.java:143) at io.netty.channel.CombinedChannelDuplexHandler.exceptionCaught(CombinedChannelDuplexHandler.java:231) at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:346) at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:325) at io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:317) at io.netty.handler.timeout.ReadTimeoutHandler.readTimedOut(ReadTimeoutHandler.java:98) at io.netty.handler.timeout.ReadTimeoutHandler.channelIdle(ReadTimeoutHandler.java:90) at io.netty.handler.timeout.IdleStateHandler$ReaderIdleTimeoutTask.run(IdleStateHandler.java:503) at io.netty.handler.timeout.IdleStateHandler$AbstractIdleTask.run(IdleStateHandler.java:475) at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98) at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:153) at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:174) at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:167) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:566) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.base/java.lang.Thread.run(Thread.java:833) 13:37:44.174 [parallel-6] TRACE i.m.h.s.netty.RoutingInBoundHandler - Encoding emitted response object [Internal Server Error] using codec: io.micronaut.json.codec.JsonMediaTypeCodec@4167d97b 13:37:44.175 [parallel-6] DEBUG i.m.h.s.netty.RoutingInBoundHandler - Response 500 - GET /load?attempts=1 13:37:44.177 [default-nioEventLoopGroup-1-16] DEBUG i.m.c.e.ApplicationEventPublisher - Publishing event: io.micronaut.http.context.event.HttpRequestTerminatedEvent[source=GET /load?attempts=1] 13:37:44.177 [default-nioEventLoopGroup-1-16] TRACE i.m.c.e.ApplicationEventPublisher - Established event listeners [io.micronaut.runtime.http.scope.RequestCustomScope@17593fa] for event: io.micronaut.http.context.event.HttpRequestTerminatedEvent[source=GET /load?attempts=1] 13:37:44.177 [default-nioEventLoopGroup-1-16] TRACE i.m.c.e.ApplicationEventPublisher - Invoking event listener [io.micronaut.runtime.http.scope.RequestCustomScope@17593fa] for event: io.micronaut.http.context.event.HttpRequestTerminatedEvent[source=GET /load?attempts=1]
Environment Information
Windows 11
openjdk 17.0.2 2022-01-18
OpenJDK Runtime Environment (build 17.0.2+8-86)
OpenJDK 64-Bit Server VM (build 17.0.2+8-86, mixed mode, sharing)
Example Application
https://github.com/rbhambhra/httpclient
Version
3.8.3
The text was updated successfully, but these errors were encountered: