Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Direct memory leak of V2 protocol ReadResponse #3527

Closed
horizonzy opened this issue Oct 13, 2022 · 2 comments
Closed

Direct memory leak of V2 protocol ReadResponse #3527

horizonzy opened this issue Oct 13, 2022 · 2 comments
Labels

Comments

@horizonzy
Copy link
Member

horizonzy commented Oct 13, 2022

BUG REPORT
Recently, in our customer product, we found that the bookie direct memory increases slowly over time, finally it throws netty allocate direct memory failed exception, then bookie restart.

Dump info
So we dump the heap and found that there are some unreachable ReadResponse, and the data property ByteBuf refCnt is 2, and the ByteBuf is also unreachable, so there is no chance to release the ByteBuf, it will be deleted after JVM GC.
image

See the picture, we can think the memory leak already happen.

Log info
In the bookie log, we found there are many disconnect info like:

"Oct 12, 2022 @ 07:47:28.673","","2022-10-12T07:47:28,672+0000 [bookie-io-1-6] INFO  org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0x111df957, L:/xx.xx.xx.xx:3181 ! R:/xx.xx.xx.xx:44396]"

The Root Case
The memory leak occurred in an edge scene.
In the normal case, we get the data from BookieImpl and wrap it in ReadResponse, and then write the ReadResponse to the netty channel, there is our netty handler to handle the ReadResponse.

The ByteBufList is implemented interce ReferenceCounted, so after netty flush it, netty will release it by invoke io.netty.util.ReferenceCountUtil#release(java.lang.Object)

    public static boolean release(Object msg, int decrement) {
        ObjectUtil.checkPositive(decrement, "decrement");
        if (msg instanceof ReferenceCounted) {
            return ((ReferenceCounted) msg).release(decrement);
        }
        return false;
    }

In the io.netty.util.ReferenceCountUtil#release(java.lang.Object), it will check if the msg is implemented ReferenceCounted. If not, do nothing.

There is an important netty mechanism you should know, after a connection disconnect, there are two things that happen.

  1. all the handlers will be removed from the ChannelPipeline after channelUnregistered.
    https://github.com/netty/netty/blob/7971075bbe9f5509c8b20c0e702ec2affb37d76e/transport/src/main/java/io/netty/channel/DefaultChannelPipeline.java#L1387-L1394
    At line_1392, it will remove all the handlers, only leaving two netty default contextHandler in the ChannelPipeline, HeadContext and TailContext

  2. Make AbstractChannel.AbstractUnsafe#outboundBuffer = null.
    https://github.com/netty/netty/blob/7971075bbe9f5509c8b20c0e702ec2affb37d76e/transport/src/main/java/io/netty/channel/AbstractChannel.java#L703

So we know, after the connection disconnect, the channel only left HeadContext and TailContext, and the AbstractChannel.AbstractUnsafe#outboundBuffer is null. If the bookie still write ReadResponse to netty channel, the ResponseEncoder didn't work(already removed), the HeadContext will write ReadResponse to AbstractUnsafe#outboundBuffer directly.
https://github.com/netty/netty/blob/7971075bbe9f5509c8b20c0e702ec2affb37d76e/transport/src/main/java/io/netty/channel/AbstractChannel.java#L847-L864

see line_851, the outboundBuffer is null, it will release ReadResponse, but the ReadResponse didn't implement ReferenceCounted, so it didn't release the ReadResponse.data. Memory leak!!!

To Reproduce

Steps to reproduce the behavior:

  1. Start bookies.
  2. Start a bookkeeper client using V2 protocol.
  3. use the bookkeeper client to read entries.
  4. In the reading process, kill -9 bookkeeper client program.
  5. Bookie server will memory leak.
@michaeljmarshall
Copy link
Member

Thanks for the thorough analysis @horizonzy

@lhotari
Copy link
Member

lhotari commented Oct 14, 2022

Good work @horizonzy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants