-
Notifications
You must be signed in to change notification settings - Fork 543
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Store gateway mmap removal: proof of concept: replace bufio.Reader
with buffered reader optimised for our use case
#3798
Conversation
…reading files. Specifically, compared to bufio.Reader, bufferingFileReader has one behaviour that dramatically improves performance for our use case: when we skip backwards in a file to an earlier position, if the buffer already contains the contents of the file at that position, we don't discard the buffer (bufio.Reader always discards the buffer after any Reset() call). The performance improvement is dramatic: name old time/op new time/op delta NewStreamBinaryReader/1Names1Values-10 122µs ± 1% 120µs ± 3% ~ (p=0.310 n=5+5) NewStreamBinaryReader/1Names10Values-10 122µs ± 1% 119µs ± 1% -2.27% (p=0.016 n=5+5) NewStreamBinaryReader/1Names100Values-10 132µs ± 2% 129µs ± 1% -2.87% (p=0.008 n=5+5) NewStreamBinaryReader/1Names500Values-10 158µs ± 1% 155µs ± 2% -1.85% (p=0.008 n=5+5) NewStreamBinaryReader/1Names1000Values-10 193µs ± 2% 183µs ± 2% -5.27% (p=0.008 n=5+5) NewStreamBinaryReader/1Names5000Values-10 495µs ± 1% 471µs ± 0% -4.81% (p=0.016 n=5+4) NewStreamBinaryReader/20Names1Values-10 169µs ± 2% 134µs ± 2% -20.53% (p=0.008 n=5+5) NewStreamBinaryReader/20Names10Values-10 200µs ± 3% 145µs ± 2% -27.64% (p=0.008 n=5+5) NewStreamBinaryReader/20Names100Values-10 353µs ± 0% 242µs ± 2% -31.60% (p=0.016 n=4+5) NewStreamBinaryReader/20Names500Values-10 907µs ± 4% 724µs ± 4% -20.17% (p=0.008 n=5+5) NewStreamBinaryReader/20Names1000Values-10 1.53ms ± 1% 1.43ms ± 4% -6.88% (p=0.016 n=4+5) NewStreamBinaryReader/20Names5000Values-10 6.28ms ± 1% 5.86ms ± 1% -6.64% (p=0.008 n=5+5) NewStreamBinaryReader/50Names1Values-10 274µs ± 1% 162µs ± 1% -41.09% (p=0.008 n=5+5) NewStreamBinaryReader/50Names10Values-10 385µs ± 5% 191µs ± 1% -50.39% (p=0.008 n=5+5) NewStreamBinaryReader/50Names100Values-10 741µs ± 8% 457µs ± 4% -38.30% (p=0.008 n=5+5) NewStreamBinaryReader/50Names500Values-10 2.09ms ± 3% 1.63ms ± 3% -22.24% (p=0.008 n=5+5) NewStreamBinaryReader/50Names1000Values-10 3.62ms ± 1% 3.30ms ± 2% -9.01% (p=0.008 n=5+5) NewStreamBinaryReader/50Names5000Values-10 15.5ms ± 1% 14.5ms ± 1% -6.49% (p=0.008 n=5+5) NewStreamBinaryReader/100Names1Values-10 537µs ± 1% 200µs ± 3% -62.67% (p=0.008 n=5+5) NewStreamBinaryReader/100Names10Values-10 742µs ± 3% 254µs ± 2% -65.70% (p=0.008 n=5+5) NewStreamBinaryReader/100Names100Values-10 1.38ms ± 6% 0.79ms ± 3% -42.49% (p=0.008 n=5+5) NewStreamBinaryReader/100Names500Values-10 3.92ms ± 2% 3.12ms ± 2% -20.37% (p=0.008 n=5+5) NewStreamBinaryReader/100Names1000Values-10 6.93ms ± 1% 6.31ms ± 2% -8.86% (p=0.008 n=5+5) NewStreamBinaryReader/100Names5000Values-10 31.1ms ± 1% 28.8ms ± 1% -7.46% (p=0.008 n=5+5) NewStreamBinaryReader/200Names1Values-10 1.16ms ± 3% 0.28ms ± 7% -75.88% (p=0.008 n=5+5) NewStreamBinaryReader/200Names10Values-10 1.51ms ± 1% 0.40ms ± 5% -73.57% (p=0.016 n=4+5) NewStreamBinaryReader/200Names100Values-10 2.72ms ± 0% 1.48ms ± 6% -45.44% (p=0.016 n=4+5) NewStreamBinaryReader/200Names500Values-10 7.60ms ± 1% 5.93ms ± 0% -21.96% (p=0.016 n=5+4) NewStreamBinaryReader/200Names1000Values-10 13.7ms ± 1% 12.3ms ± 1% -9.80% (p=0.008 n=5+5) NewStreamBinaryReader/200Names5000Values-10 63.3ms ± 2% 57.7ms ± 0% -8.93% (p=0.016 n=5+4) name old alloc/op new alloc/op delta NewStreamBinaryReader/1Names1Values-10 3.18MB ± 0% 3.18MB ± 0% -0.00% (p=0.040 n=5+5) NewStreamBinaryReader/1Names10Values-10 3.18MB ± 0% 3.18MB ± 0% ~ (p=0.524 n=5+5) NewStreamBinaryReader/1Names100Values-10 3.18MB ± 0% 3.18MB ± 0% ~ (p=0.270 n=5+5) NewStreamBinaryReader/1Names500Values-10 3.18MB ± 0% 3.18MB ± 0% ~ (p=0.683 n=5+5) NewStreamBinaryReader/1Names1000Values-10 3.18MB ± 0% 3.18MB ± 0% ~ (p=0.651 n=5+5) NewStreamBinaryReader/1Names5000Values-10 3.20MB ± 0% 3.20MB ± 0% ~ (p=0.302 n=5+4) NewStreamBinaryReader/20Names1Values-10 3.18MB ± 0% 3.18MB ± 0% ~ (p=0.889 n=5+5) NewStreamBinaryReader/20Names10Values-10 3.18MB ± 0% 3.18MB ± 0% -0.00% (p=0.032 n=5+5) NewStreamBinaryReader/20Names100Values-10 3.19MB ± 0% 3.19MB ± 0% -0.00% (p=0.008 n=5+5) NewStreamBinaryReader/20Names500Values-10 3.22MB ± 0% 3.22MB ± 0% ~ (p=0.175 n=4+5) NewStreamBinaryReader/20Names1000Values-10 3.27MB ± 0% 3.27MB ± 0% ~ (p=0.200 n=4+4) NewStreamBinaryReader/20Names5000Values-10 3.56MB ± 0% 3.56MB ± 0% ~ (p=1.000 n=5+5) NewStreamBinaryReader/50Names1Values-10 3.19MB ± 0% 3.19MB ± 0% -0.00% (p=0.008 n=5+5) NewStreamBinaryReader/50Names10Values-10 3.19MB ± 0% 3.19MB ± 0% -0.00% (p=0.008 n=5+5) NewStreamBinaryReader/50Names100Values-10 3.21MB ± 0% 3.21MB ± 0% ~ (p=0.087 n=5+5) NewStreamBinaryReader/50Names500Values-10 3.29MB ± 0% 3.29MB ± 0% -0.00% (p=0.016 n=5+4) NewStreamBinaryReader/50Names1000Values-10 3.41MB ± 0% 3.41MB ± 0% ~ (p=0.460 n=5+5) NewStreamBinaryReader/50Names5000Values-10 4.13MB ± 0% 4.13MB ± 0% ~ (p=0.421 n=5+5) NewStreamBinaryReader/100Names1Values-10 3.20MB ± 0% 3.20MB ± 0% -0.00% (p=0.008 n=5+5) NewStreamBinaryReader/100Names10Values-10 3.20MB ± 0% 3.20MB ± 0% -0.00% (p=0.008 n=5+5) NewStreamBinaryReader/100Names100Values-10 3.25MB ± 0% 3.25MB ± 0% ~ (p=0.111 n=4+5) NewStreamBinaryReader/100Names500Values-10 3.41MB ± 0% 3.41MB ± 0% ~ (p=0.690 n=5+5) NewStreamBinaryReader/100Names1000Values-10 3.64MB ± 0% 3.64MB ± 0% ~ (p=0.968 n=5+4) NewStreamBinaryReader/100Names5000Values-10 5.08MB ± 0% 5.08MB ± 0% ~ (p=1.000 n=5+5) NewStreamBinaryReader/200Names1Values-10 3.22MB ± 0% 3.22MB ± 0% -0.00% (p=0.008 n=5+5) NewStreamBinaryReader/200Names10Values-10 3.23MB ± 0% 3.23MB ± 0% -0.00% (p=0.016 n=4+5) NewStreamBinaryReader/200Names100Values-10 3.32MB ± 0% 3.32MB ± 0% -0.00% (p=0.029 n=4+4) NewStreamBinaryReader/200Names500Values-10 3.64MB ± 0% 3.64MB ± 0% ~ (p=0.421 n=5+5) NewStreamBinaryReader/200Names1000Values-10 4.10MB ± 0% 4.10MB ± 0% ~ (p=0.421 n=5+5) NewStreamBinaryReader/200Names5000Values-10 6.98MB ± 0% 6.98MB ± 0% ~ (p=0.548 n=5+5) name old allocs/op new allocs/op delta NewStreamBinaryReader/1Names1Values-10 76.0 ± 0% 76.0 ± 0% ~ (all equal) NewStreamBinaryReader/1Names10Values-10 78.0 ± 0% 78.0 ± 0% ~ (all equal) NewStreamBinaryReader/1Names100Values-10 84.0 ± 0% 84.0 ± 0% ~ (all equal) NewStreamBinaryReader/1Names500Values-10 98.0 ± 0% 98.0 ± 0% ~ (all equal) NewStreamBinaryReader/1Names1000Values-10 115 ± 0% 115 ± 0% ~ (all equal) NewStreamBinaryReader/1Names5000Values-10 242 ± 0% 242 ± 0% ~ (all equal) NewStreamBinaryReader/20Names1Values-10 154 ± 0% 154 ± 0% ~ (all equal) NewStreamBinaryReader/20Names10Values-10 194 ± 0% 194 ± 0% ~ (all equal) NewStreamBinaryReader/20Names100Values-10 314 ± 0% 314 ± 0% ~ (all equal) NewStreamBinaryReader/20Names500Values-10 594 ± 0% 594 ± 0% ~ (all equal) NewStreamBinaryReader/20Names1000Values-10 934 ± 0% 934 ± 0% ~ (all equal) NewStreamBinaryReader/20Names5000Values-10 3.47k ± 0% 3.47k ± 0% ~ (p=1.000 n=5+5) NewStreamBinaryReader/50Names1Values-10 278 ± 0% 278 ± 0% ~ (all equal) NewStreamBinaryReader/50Names10Values-10 378 ± 0% 378 ± 0% ~ (all equal) NewStreamBinaryReader/50Names100Values-10 678 ± 0% 678 ± 0% ~ (all equal) NewStreamBinaryReader/50Names500Values-10 1.38k ± 0% 1.38k ± 0% ~ (all equal) NewStreamBinaryReader/50Names1000Values-10 2.23k ± 0% 2.23k ± 0% ~ (all equal) NewStreamBinaryReader/50Names5000Values-10 8.58k ± 0% 8.58k ± 0% ~ (p=1.000 n=4+5) NewStreamBinaryReader/100Names1Values-10 481 ± 0% 481 ± 0% ~ (all equal) NewStreamBinaryReader/100Names10Values-10 681 ± 0% 681 ± 0% ~ (p=0.444 n=5+5) NewStreamBinaryReader/100Names100Values-10 1.28k ± 0% 1.28k ± 0% ~ (p=0.167 n=5+5) NewStreamBinaryReader/100Names500Values-10 2.68k ± 0% 2.68k ± 0% ~ (all equal) NewStreamBinaryReader/100Names1000Values-10 4.38k ± 0% 4.38k ± 0% ~ (all equal) NewStreamBinaryReader/100Names5000Values-10 17.1k ± 0% 17.1k ± 0% ~ (p=1.000 n=5+5) NewStreamBinaryReader/200Names1Values-10 882 ± 0% 882 ± 0% ~ (all equal) NewStreamBinaryReader/200Names10Values-10 1.28k ± 0% 1.28k ± 0% ~ (all equal) NewStreamBinaryReader/200Names100Values-10 2.48k ± 0% 2.48k ± 0% ~ (all equal) NewStreamBinaryReader/200Names500Values-10 5.28k ± 0% 5.28k ± 0% ~ (all equal) NewStreamBinaryReader/200Names1000Values-10 8.68k ± 0% 8.68k ± 0% ~ (all equal) NewStreamBinaryReader/200Names5000Values-10 34.1k ± 0% 34.1k ± 0% ~ (p=0.333 n=5+4) Things that would need to be done to make this production-ready: * tests for bufferingFileReader (in addition the coverage provided by the existing tests that rely on it) * benchmarking on a machine with a spinning-rust disk * skipping the buffer for Read() calls where we'd have to fill the buffer anyway
@56quarters @pracucci interested in your thoughts on this before I take it much further - do you think this is something worth pursuing? |
This results in a further performance improvement of up to 11% on my laptop: name old time/op new time/op delta NewStreamBinaryReader/1Names1Values-10 120µs ± 3% 114µs ± 3% -4.52% (p=0.032 n=5+5) NewStreamBinaryReader/1Names10Values-10 119µs ± 1% 114µs ± 3% -4.80% (p=0.008 n=5+5) NewStreamBinaryReader/1Names100Values-10 129µs ± 1% 123µs ± 1% -4.40% (p=0.008 n=5+5) NewStreamBinaryReader/1Names500Values-10 155µs ± 2% 145µs ± 1% -6.60% (p=0.008 n=5+5) NewStreamBinaryReader/1Names1000Values-10 183µs ± 2% 173µs ± 1% -5.31% (p=0.008 n=5+5) NewStreamBinaryReader/1Names5000Values-10 471µs ± 0% 427µs ± 3% -9.46% (p=0.016 n=4+5) NewStreamBinaryReader/20Names1Values-10 134µs ± 2% 160µs ±20% ~ (p=0.095 n=5+5) NewStreamBinaryReader/20Names10Values-10 145µs ± 2% 146µs ± 1% ~ (p=0.190 n=5+4) NewStreamBinaryReader/20Names100Values-10 242µs ± 2% 227µs ± 3% -5.97% (p=0.008 n=5+5) NewStreamBinaryReader/20Names500Values-10 724µs ± 4% 655µs ± 0% -9.54% (p=0.016 n=5+4) NewStreamBinaryReader/20Names1000Values-10 1.43ms ± 4% 1.28ms ± 4% -10.08% (p=0.008 n=5+5) NewStreamBinaryReader/20Names5000Values-10 5.86ms ± 1% 5.28ms ± 2% -9.98% (p=0.008 n=5+5) NewStreamBinaryReader/50Names1Values-10 162µs ± 1% 159µs ± 3% ~ (p=0.151 n=5+5) NewStreamBinaryReader/50Names10Values-10 191µs ± 1% 185µs ± 1% -3.23% (p=0.008 n=5+5) NewStreamBinaryReader/50Names100Values-10 457µs ± 4% 429µs ± 2% -6.21% (p=0.008 n=5+5) NewStreamBinaryReader/50Names500Values-10 1.63ms ± 3% 1.49ms ± 0% -8.51% (p=0.016 n=5+4) NewStreamBinaryReader/50Names1000Values-10 3.30ms ± 2% 2.94ms ± 3% -10.69% (p=0.008 n=5+5) NewStreamBinaryReader/50Names5000Values-10 14.5ms ± 1% 13.0ms ± 1% -9.91% (p=0.008 n=5+5) NewStreamBinaryReader/100Names1Values-10 200µs ± 3% 198µs ± 2% ~ (p=0.421 n=5+5) NewStreamBinaryReader/100Names10Values-10 254µs ± 2% 252µs ± 3% ~ (p=0.548 n=5+5) NewStreamBinaryReader/100Names100Values-10 793µs ± 3% 744µs ± 3% -6.21% (p=0.008 n=5+5) NewStreamBinaryReader/100Names500Values-10 3.12ms ± 2% 2.83ms ± 2% -9.19% (p=0.008 n=5+5) NewStreamBinaryReader/100Names1000Values-10 6.31ms ± 2% 5.59ms ± 1% -11.40% (p=0.008 n=5+5) NewStreamBinaryReader/100Names5000Values-10 28.8ms ± 1% 26.1ms ± 2% -9.33% (p=0.008 n=5+5) NewStreamBinaryReader/200Names1Values-10 279µs ± 7% 268µs ± 1% ~ (p=0.548 n=5+5) NewStreamBinaryReader/200Names10Values-10 399µs ± 5% 387µs ± 2% ~ (p=0.095 n=5+5) NewStreamBinaryReader/200Names100Values-10 1.48ms ± 6% 1.40ms ± 6% ~ (p=0.095 n=5+5) NewStreamBinaryReader/200Names500Values-10 5.93ms ± 0% 5.47ms ± 1% -7.68% (p=0.016 n=4+5) NewStreamBinaryReader/200Names1000Values-10 12.3ms ± 1% 10.9ms ± 1% -11.36% (p=0.008 n=5+5) NewStreamBinaryReader/200Names5000Values-10 57.7ms ± 0% 52.1ms ± 1% -9.59% (p=0.016 n=4+5)
…cantly worse on spinning rust machines. This reverts commit aa5bfb7.
I'm not a huge fan of re-implementing something in the standard library because it creates a maintenance burden for us. I would like to run the |
Makes sense to me, let's wait and see how it performs in prod first. |
Maybe it's a non sense idea, but I'm wondering if we could take advantage of |
Interesting idea. It's definitely worth a try, and has the added benefit of removing the special case for the last value of the last name. One thing we'd need to keep in mind (and I'm writing this here primarily for my future self): in order for this to work, we'd need to copy the label value somewhere first (without allocating a string), as once we peek at the next label name, we'll invalidate the value that we've previously peeked at. It'd be interesting to benchmark and see if the additional CPU time spent copying is worth it for the reduction in seeking and discarding the buffer. |
By "copy the label value" you mean the "current label name" so that we can compare it with the one we'll |
The current label name is already in a safe location ( |
|
I've thrown together While it works, it makes the code much more complex and performance isn't consistently better than the current implementation (tested on my machine, which has a fast SSD):
As @56quarters suggested, let's wait and see if we need to invest more time into optimising this. If it turns out we need to optimise the performance of loading the index-header further, I think your earlier idea (#3761) of having the compactor generate a minimal index-header we can read in one pass is the best place to spend our effort. |
Thanks for trying! I will stop sharing ideas that don't work :( |
Please keep the ideas coming :) |
Based on use in clusters at Grafana Labs, we're happy with index-header read performance in store gateways, so there's no need for this. |
What this PR does
The change in #3742 dramatically improved the performance of loading an index-header file from disk, but has a drawback: because we need to seek backwards after reading a previously-unread label name, we must reset the buffered reader each time we do this. This has a significant performance impact.
(Why does this have such a large impact? Because
bufio.Reader
is designed as a buffered reader for all kinds of streams (not just files), when we seek backwards, it must discard the entire buffer and fill it from disk again. This is very wasteful, as we're often seeking backwards a small distance and so the buffer likely contains the data we're interested in.)This PR replaces our use of
bufio.Reader
with our own implementation optimised for this scenario, taking advantage of the fact that we only ever read an index-header from a file.The performance impact is significant, with improvements anywhere from 2% to 75% depending on the ratio of unique names to labels:
Benchmark results running on my laptop with a SSD
Benchmark results running on a VM with a spinning rust disk
I'm opening this as a draft for discussion because implementing something like this is tricky - we'd be replacing the battle-tested
bufio.Reader
with something new, with all of the inevitable hard-to-find bugs that entails.If we do want to pursue this, I'd want to:
bufferingFileReader
(in addition the coverage provided by the existing tests that rely on it indirectly) before mergingWhich issue(s) this PR fixes or relates to
Part of #3465, relates to #3742
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]