'acquired' channels not decremented in some timeout scenarios #9448

dhofftgt · 2023-06-14T20:32:19Z

Expected Behavior

Acquired count should reflect channels in use in all cases.

Actual Behaviour

In scenarios where the client has a fixed channel pool and it is backed up we see the 'acquired' channel tracking can 'leak' meaning it is incremented and never comes down causing all subsequent requests to time out waiting for a channel.

Steps To Reproduce

See example application:

The linked project reproduces an issue where the 'acquired' channel count maintained by the FixedChannelPool for a client can not be decremented in the case the netty client threads are unresponsive (seems to correlate with when io.micronaut.http.client.exceptions.ReadTimeoutException is thrown)

the MicronautAcquireLeakTest reproduces the issue and the following conditions

max connections is set for the client
the @client returns a future with a POJO Deserialized from JSON
the netty client thread is unresponsive or backed up causing io.micronaut.http.client.exceptions.ReadTimeoutException to be thrown by DefaultHttpClient (I blocked the thread in the thenApply on a future returned by a @client. Note it is odd that this is executed on the netty thread, not the IO executor)

The test executes like this:

The test spits out a bunch of http client requests
a callback on the response CompletableFuture will block on a latch causing the netty client threads to hang and become unresponsive
on subsequent requests DefaultHttpClient will give up 1 second after the netty read timeout and throw io.micronaut.http.client.exceptions.ReadTimeoutException
we wait 5 seconds to make sure things have timed out then release the latch so the client become responsive again
wait for client to be idle by checking the pendingTasks() is 0
acquiredChannels should be 0, but it's not.

Environment Information

mac M1 os 13.4
Micronaut version 3.9.3
openjdk 11.0.18 2023-01-17 LTS
OpenJDK Runtime Environment Zulu11.62+17-CA (build 11.0.18+10-LTS)
OpenJDK 64-Bit Server VM Zulu11.62+17-CA (build 11.0.18+10-LTS, mixed mode)

Example Application

https://github.com/dhofftgt/micronaut-acquire-leak

Version

3.9.3

When there was a read timeout (i.e. timeout handled by DefaultHttpClient) while the Mono<PoolHandle> from ConnectionManager was not yet complete, the PoolHandle would be dropped silently. This patch handles cancellation of the Mono properly, releasing the pool handle. This is already fixed by the ConnectionManager rework in 4.0. Fixes #9448

yawkat · 2023-06-15T10:21:35Z

thanks for the report, i've made a pr to fix it.

please don't block the event loop like in your example though, netty does not like it :)

dhofftgt · 2023-06-15T13:24:56Z

Awesome! Thanks! Haha, yes. Blocking purely to create conditions reproduce the problem.

When there was a read timeout (i.e. timeout handled by DefaultHttpClient) while the Mono<PoolHandle> from ConnectionManager was not yet complete, the PoolHandle would be dropped silently. This patch handles cancellation of the Mono properly, releasing the pool handle. This is already fixed by the ConnectionManager rework in 4.0. Fixes #9448 Co-authored-by: Sergio del Amo <sergio.delamo@softamo.com>

stepanv · 2023-06-16T18:09:56Z

That's perfect, I just wanted to report this problem.
@yawkat when can we expect this fix to be released? Thx!

dhofftgt · 2023-07-05T14:48:28Z

@yawkat I am also wondering when this will be in a release. We are currently holding off upgrading until this fix is available.

yawkat · 2023-07-10T06:46:29Z

im not sure, 4.0 will probably be released before and 4.0 never had this issue

graemerocher assigned yawkat Jun 14, 2023

yawkat mentioned this issue Jun 15, 2023

Fix connection leak with read timeout during acquisition #9449

Merged

yawkat closed this as completed Jun 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

'acquired' channels not decremented in some timeout scenarios #9448

'acquired' channels not decremented in some timeout scenarios #9448

dhofftgt commented Jun 14, 2023

yawkat commented Jun 15, 2023

dhofftgt commented Jun 15, 2023

stepanv commented Jun 16, 2023

dhofftgt commented Jul 5, 2023

yawkat commented Jul 10, 2023

'acquired' channels not decremented in some timeout scenarios #9448

'acquired' channels not decremented in some timeout scenarios #9448

Comments

dhofftgt commented Jun 14, 2023

Expected Behavior

Actual Behaviour

Steps To Reproduce

Environment Information

Example Application

Version

yawkat commented Jun 15, 2023

dhofftgt commented Jun 15, 2023

stepanv commented Jun 16, 2023

dhofftgt commented Jul 5, 2023

yawkat commented Jul 10, 2023