Replies: 9 comments
-
Hi and thanks a lot for the thorough analysis! I'd definitely be interested in trying to improve performance for your use case. Off the top of my head, here's a few things that might be worth playing with:
This information can be very helpful in understanding where the driver is wasting time. The most interesting numbers are I think this is what happens as you increase both block size and image size:
|
Beta Was this translation helpful? Give feedback.
-
Thanks for the detailed answer and good leads! |
Beta Was this translation helpful? Give feedback.
-
FWIW, I've downloaded all 2022/2023 logs from SEC's EDGAR, extracted the files, and split each file every 1000 lines, ultimately resulting in 2,075,627 files and almost 200 GiB of data in total. I compressed this using:
I chose This resulted in a 16 GiB image file. I can read data from the mounted image at around 3 GiB/s:
Note that this uses 16 parallel Another interesting data point: searching all logs using ripgrep took 110 seconds, so that's about 1.9 GiB/s. I'm not entirely sure how ripgrep determines the order in which to scan files, but it looks somewhat random to me. Increasing the number of workers for the FUSE driver to 8 ( |
Beta Was this translation helpful? Give feedback.
-
I've also built another DwarFS image, but this time with
The similarity-ordered image is almost 5% bigger. More importantly, as suspected, sequential read rate absolutely tanks:
That's almost 200 times slower than the image with path ordering. And, as suspected, it shows in the cache metrics: the miss rate for the similarity ordered image is at 34%, whereas for the path ordered image it is at 0.1%. I also used a small subset (all logs from Q2/2022, 17.24 GiB in 177,598 files) to test different compression algorithms. A bit of a surprise, but
It comes at a cost, though. Compression for
The squashfs image was built using:
The "(10k)" and "(100k)" DwarFS images were built from the exact same data, but with files split into 10k lines and 100k lines each instead of 1k lines. So each file is 10x / 100x larger. This was mainly done to demonstrate the overhead of accessing many small files compared to fewer large files when only using a single process/thread to read the data. |
Beta Was this translation helpful? Give feedback.
-
I've also played around with the sample log files archive you've linked to. Again,
If you don't mind trading a significant amount of time for 2-3% better compression, you can drop |
Beta Was this translation helpful? Give feedback.
-
With all that being said, there's still room for improvement. Internally, DwarFS is able to read data from the file system at more than 10 GiB/s on my system, which is faster than the SSDs the image is stored on. So at least in theory, it should be possible to dump the contents of a DwarFS image faster than if the data were stored raw on disk. There's definitely some overhead due to the FUSE abstraction, but that's likely only a problem when accessing small files. I'm just working on adding a sequential access detector that can trigger prefetches of file system blocks if it detects that data is accessed sequentially. In my early tests I'm seeing roughly twice the throughput for sequential access patterns because reads will stall much less frequently:
That's 17 GiB of data in 17890 files, so about 1.5 GiB/s. When using ripgrep instead, the results are pretty much unchanged (the small difference is likely noise):
That's because its access pattern doesn't trigger the detector. Nonetheless, ripgrep scans the data at 3.5 GiB/s as it's running multi-threaded. |
Beta Was this translation helpful? Give feedback.
-
Thanks a lot for your analysis, you've done all the work for me, sorry I should have provided a larger set of data. I've still performed the analysis on my side too, and updated the results, which overall are consistent with yours. I'm just not completely convinced by Anyway the answer I came for was Your idea of prefetch is interesting though, I guess that sequential access is quite common, and the gain will come as free for the users :). |
Beta Was this translation helpful? Give feedback.
-
Cool, glad to see it's performing nicely now.
I guess that depends on how often you compress / read. But it's definitely good to know which trade-offs can be made.
You're welcom!
This feature should be enabled by default in the upcoming v0.9.9, which is going to be released quite soon to fix #217. If you want to give it a try before the release, you can grab a build from the work branch in the meantime, e.g. this build will have support for sequential read detection. dwarfs-0.9.8-17-g306eaaf178-Linux-x86_64-clang.tar.zst or dwarfs-universal-0.9.8-17-g306eaaf178-Linux-x86_64-clang would be the release build for x86_64. |
Beta Was this translation helpful? Give feedback.
-
I'll move this to discussions as it think this may be worthwhile for future reference. |
Beta Was this translation helpful? Give feedback.
-
I need to store logs generated every day, that are a bit large (around 400MB per day, i.e. almost 150GB per year), and are made of a rather high number of files (around 8k per day, i.e. almost 3M per year).
So I played a bit around with DwarFS, that seemed to allow compression ratios as good as tar.xz while keeping reading speed as good as SquashFS. However when I used large block sizes (>=2**26) for good compression ratio with quite large archives (1 month of logs, ie 13GB uncompressed, 125MB compressed, 240k files), the reading just became awfully slow, similar to reading directly from the tar.xz archive with archive mount, that is more than 30min for reading 1 day of logs instead of less than 4s with SquashFS or lower block size / archive size with DwarFS.
So I was a bit disappointed to realize that eventually I could not get both the good compression ratio and reasonable reading speed at the same time. I made a quite extensive benchmark to have a better view, and make sure that there was no suitable sweet spot for my use case. It is available here if you need more detail : https://github.com/cyril42e/dwarfs-scalability/tree/master .
Still I am curious whether this is expected, fixable, or if there is just no way around ?
What gives me a little hope that it could be fixable is that DwarFS exhibits an increase of reading time with archive size that SquashFS does not exhibit with similar block size / compression ratio (of course SquashFS limits block size to 2**20 / 1MB).
Beta Was this translation helpful? Give feedback.
All reactions