Improvements for handling many small entries #3559

merlimat · 2022-10-20T06:20:04Z

The BK data path is very efficient when processing large entries and it's generally able to saturate the disk and network IO in these cases.

By contrast, when handling a large number of very small entries there are several inefficiencies that cause the CPU to become the bottleneck, because of the per-entry overhead.

There are several low-hanging fruits to tackle to improve performance:

Reduce contention between message passing

Reduce contention in journal & force-write queues:

Improve the OrderedExecutor performance:

Use SingleThreadExecutor for OrderedExecutor and drainTo() tasks into local array #3546

Reduce the number of buffers allocated per entry written/read.

For each entry being written in a ledger we are using 4 ByteBuf instances:

The entry payload (this gets passed in to BK client)
The checksum
The serialized `AddRequest
The 4 byte size header

These buffers are passed to Netty which will do a scatter writev, though it will pass all the buffers.
Allocating and managing all these buffer is expensive. There is overhead in:

Refcounting
Recycler to get the ByteBuf instances and put them back in the pool
ByteBuf pool arena to handle allocations/deallocation
Inter-thread synchronization: these buffer are normally allocated in one thread and deallocated from a different thread

To make matters worse, while the checksum is computed only once, the AddRequest is serialized each time we write it on a connection.
eg: if we have write-quorum=3, it would mean we are using (2 * 3) + 1 = 7 ByteBuf per each entry.

Finally, while for big entries is very important to avoid copying the payload, for small entries the overhead of maintaining the ByteBufList is greater than just copying the payload into a single buffer.
For that we should do:

If the entries are big -> keep using ByteBufList, with 1 buffer for all the header and the 2nd buffer referencing the payload, with no copy.
If entry is small -> allocate a buffer to contain all the headers and the payload and copy into it.

Pending changes:

Avoid extra buffer to prepend frame size #3560 Add the 4 bytes frame size header when serializing the request, instead of relying on a separate Netty filter
Consolidate buffer for small entries on read-response
Serialize only once and consolidate small entries for add requests

The text was updated successfully, but these errors were encountered:

lhotari · 2022-10-21T07:23:21Z

I wonder if the changes could skew USE metrics (BP-44)?

merlimat · 2022-10-21T15:40:34Z

If the timing trace is enabled on the executor it will be measured and collected in the same exact way as before.

In #3546 I've added few more metrics that would make it easier to have charts of submitted/completed/failed/rejected tasks..

The "failed" task counter in particular would be very useful to detect unexpected exceptions that might leave a future hanging.

merlimat added the type/feature label Oct 20, 2022

merlimat mentioned this issue Oct 20, 2022

Avoid extra buffer to prepend frame size #3560

Merged

merlimat mentioned this issue Nov 1, 2022

Optimize ReadResponse for small entry sizes #3597

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improvements for handling many small entries #3559

Improvements for handling many small entries #3559

merlimat commented Oct 20, 2022 •

edited

Loading

lhotari commented Oct 21, 2022

merlimat commented Oct 21, 2022

Improvements for handling many small entries #3559

Improvements for handling many small entries #3559

Comments

merlimat commented Oct 20, 2022 • edited Loading

Reduce contention between message passing

Reduce the number of buffers allocated per entry written/read.

lhotari commented Oct 21, 2022

merlimat commented Oct 21, 2022

merlimat commented Oct 20, 2022 •

edited

Loading