-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize incrementing values in DirectFileStore adapter #280
Conversation
* Introduce FileMappedDict#increment_value * Check for internal_storage only once (Process.pid is slow on Linux) * Lookup file position only once Signed-off-by: Peter Leitzen <peter@leitzen.de>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice
I'm not opposed to this change, but:
|
Keeping in mind my comment that this may or may not be a greatly realistic test, I'm getting this on my Macbook M2 Air:
So, 3% faster on histograms, 2% faster on counter, same on gauges. I'm not sure how much to trust this benchmark... :/ Doing:
(so, sidestepping the double call to I get more or less the same result (so, either "same", or within 1 to 4% improvement to performance) I'm going to try to see if I remember how my "heavy" benchmarking code worked, and see what results I get, but again, keeping in mind that testing on a Macbook is probably not super representative. In the meantime, i'd recommend doing just the change to sidestep the double call to |
Ok, I've changed my mind :) I've run the datastores benchmarks, on a somewhat random m6i.2xlarge on EC2 that I have at hand, and they seem to be consistently faster. About 6% indeed. Doing the caching of the variable as I was suggesting is also faster than baseline, but only like 2% (ish) These seem pretty repeatable, and they consistently give me those results. So I'm convinced, we should merge this. @Sinjo, any objections to this? |
Also, sorry, what I should've started with. Thank you for looking into this and for the PR @splattael! :D |
@dmagliola Thanks a ton for reviewing and merging 🙇 You are right regarding benchmarks - they were flaky for me locally too (on Linux) but in the end they gave me overall speed improvement after running several times. This improvement makes sense (not only because of side-stepping BTW, I am planning to reduce the amount of 👋 @SuperQ ❤️ |
I had this very refactoring ready in another branch but saw less speed improvements. 😅 I should have provided more context before, sorry about 🙇 |
No worries, and thanks for the PR!
I may be misreading the docs here, but is this a method intended to be monkeypatched? 😅 That makes me a bit uncomfortable, but it seems like it's designed precisely to do that... So I guess we could do that Would that look like:
Am I understanding that correctly? |
@dmagliola Yes, is the way I understood it as well 👍 It seems that Rails, for example, already does exactly this for Ruby 3.1+. See https://github.com/rails/rails/blob/v7.0.2/activesupport/lib/active_support/fork_tracker.rb#L50 |
Ah, that's amazing! You found exactly the kind of example I was hoping to get! 🙌 NOTE: I would do "half" of what Rails is doing. I'd prepend if The logic here being that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the slow reply on my side. This all looks great!
I agree with @dmagliola about the _fork
extension point - if it's there we should use it, and if it's not then people should upgrade their Ruby version if they want a performance boost. I'd rather not be the reason somebody ends up with weird behaviour in their application.
Open to changing my mind on that if we can convince ourselves really thoroughly that the options for older Rubies are safe.
Thank you @splattael for the perf improvement, and looking forward to future PRs! 🙌 |
@dmagliola @Sinjo I agree! We should use I wonder, though, if we could provide a way for users to opt-in to use cached process ids (via 🐒 patching So:
WDYT? |
FileMappedDict#increment_value
Process.pid
is slow on Linux)Benchmarks
TLDR; Speed up counter and histogram by ~6-7%.