Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
segment_appender: truncate in addition to allocate
In the segment_appender we adaptively increase the file size of log segments using `ss::file::allocate`. This however doesn't immediately do what's intended. Internally it calls `fallocate` with the `FALLOC_FL_KEEP_SIZE|FALLOC_FL_ZERO_RANGE` flags. The former leads to the logical file size not being extended. Extending the logical file size however seems to have been the intention of 7d77067. We do already handle zero bytes at the end fine either way as we are writing zero appended 4k chunks anyway. Besides this side effect there is other reasons why we want to update the logical file size immediately. First, this means that on every write we will need to update the file size which makes fsync more expensive. Second, seastar considers XFS an append-challenged file system. This means that seastar will avoid having size changing and non-size changing operations outstanding at the same time. They will be queued internally in the ioqueue. Because of how `allocate` works above all our writes will always be considered as "appending" as we never updated the logical file size. To optimize the queued operations seastar employs certain "optimizations" in `append_challenged_posix_file_impl::optimize_queue`. Because all our operations are appending this causes a continuous stream of implicit `ftruncate` syscalls of about 100/s per shard. ``` root:/tmp# perf trace -t 11990 -s -e ftruncate -- sleep 5 Summary of events: redpanda (11990), 1122 events, 100.0% syscall calls errors total min avg max stddev (msec) (msec) (msec) (msec) (%) --------------- -------- ------ -------- --------- --------- --------- ------ ftruncate 561 0 32.056 0.013 0.057 0.265 1.85% ``` To avoid the two aforementioned issues this patch adds an explicit `truncate` after our `allocate` call. Note that we still need both as `ftruncate` alone doesn't preallocate blocks so it's a lot less performant on its own. In a medium IOPS OMB workload we see a drop in p99 producer latency from ~14ms to ~10ms. Threaded fallbacks and hence steal time are down by about 5 and 10% respectively. Further we can also see the `fsync` times distribution slightly improved: Before: ``` root:/tmp# xfsdist-bpfcc 5 1 Tracing XFS operation latency... Hit Ctrl-C to end. 14:07:37: operation = b'fsync' usecs : count distribution 0 -> 1 : 0 | | 2 -> 3 : 55 | | 4 -> 7 : 11872 |***** | 8 -> 15 : 38206 |***************** | 16 -> 31 : 18405 |******** | 32 -> 63 : 11474 |***** | 64 -> 127 : 29074 |************* | 128 -> 255 : 88607 |****************************************| 256 -> 511 : 27403 |************ | 512 -> 1023 : 557 | | 1024 -> 2047 : 28 | | 2048 -> 4095 : 23 | ``` After: ``` root:/tmp# xfsdist-bpfcc 5 1 Tracing XFS operation latency... Hit Ctrl-C to end. 13:57:45: operation = b'fsync' usecs : count distribution 0 -> 1 : 0 | | 2 -> 3 : 56 | | 4 -> 7 : 15472 |******* | 8 -> 15 : 45440 |*********************** | 16 -> 31 : 22825 |*********** | 32 -> 63 : 13378 |****** | 64 -> 127 : 31979 |**************** | 128 -> 255 : 77529 |****************************************| 256 -> 511 : 18116 |********* | 512 -> 1023 : 250 | | 1024 -> 2047 : 1 | ``` Further we can also observe the logial file size adapting during a run: Before: ``` root:/tmp# for i in {0..5} ; do ll /var/lib/redpanda/data/kafka/test-topic-zUv2_Hs-0000/0_32/ | grep log ; sleep 1 ; done -rw-r--r-- 1 redpanda redpanda 37502976 May 21 10:10 0-1-v1.log -rw-r--r-- 1 redpanda redpanda 39337984 May 21 10:10 0-1-v1.log -rw-r--r-- 1 redpanda redpanda 41267200 May 21 10:10 0-1-v1.log -rw-r--r-- 1 redpanda redpanda 43479040 May 21 10:10 0-1-v1.log -rw-r--r-- 1 redpanda redpanda 45416448 May 21 10:10 0-1-v1.log -rw-r--r-- 1 redpanda redpanda 47005696 May 21 10:10 0-1-v1.log ``` After: ``` root:/tmp# for i in {0..5} ; do ll /var/lib/redpanda/data/kafka/test-topic-Yk-n3mY-0000/0_35/ | grep log ; sleep 1 ; done -rw-r--r-- 1 redpanda redpanda 67108864 May 21 10:22 0-1-v1.log -rw-r--r-- 1 redpanda redpanda 67108864 May 21 10:22 0-1-v1.log -rw-r--r-- 1 redpanda redpanda 67108864 May 21 10:22 0-1-v1.log -rw-r--r-- 1 redpanda redpanda 67108864 May 21 10:22 0-1-v1.log -rw-r--r-- 1 redpanda redpanda 67108864 May 21 10:22 0-1-v1.log -rw-r--r-- 1 redpanda redpanda 67108864 May 21 10:22 0-1-v1.log ```
- Loading branch information