-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PoC: precompute embeded string literals hash code #553
Conversation
compile.c
Outdated
@@ -10593,7 +10593,7 @@ iseq_compile_each0(rb_iseq_t *iseq, LINK_ANCHOR *const ret, const NODE *const no | |||
lit = rb_str_freeze(lit); | |||
} | |||
else { | |||
lit = rb_fstring(lit); | |||
lit = rb_fstring(rb_str_precompute_hash(rb_str_freeze(lit))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NB: the compiler should always feed pre-frozen strings to rb_fstring
as to save a dup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or even better, it could use rb_enc_interned_str
to save a bunch of allocs.
e205c02
to
f22b383
Compare
Note to self, we're precomputing in the compiler, we'd need to handle loading cached ISeq too. |
@XrXr: shared https://bugs.ruby-lang.org/issues/15331, but seems like that patch was lazily computing the hashcode, here we eagerly do it during compilation, so I think we benefit more from it. |
f22b383
to
1a17cf3
Compare
Unclear how much worth it, it would be, but with embeded strings we often have some space left in the slot, which we can use to store the string Hash code. It's probably only worth it for string literals, as they are the ones likely to be used as hash keys. We could even ony do this for string literal actually used as hash keys, as seen by the compiler. It is also unclear what the best layout would be. We currently store it right after the string terminator as to make it easy/fast to compute, but ideally it would be at a fixed position, however this means creating another `RString` union, which would complexify the code base significantly. It's also unclear to us whether trying to respect alignment would lead to a performance gain or not. ```ruby hash = 10.times.to_h do |i| [i, i] end dyn_sym = "dynamic_symbol".to_sym hash[:some_symbol] = 1 hash[dyn_sym] = 1 hash["small"] = 2 hash["frozen_string_literal"] = 2 Benchmark.ips do |x| x.report("symbol") { hash[:some_symbol] } x.report("dyn_symbol") { hash[:some_symbol] } x.report("small_lit") { hash["small"] } x.report("frozen_lit") { hash["frozen_string_literal"] } x.compare!(order: :baseline) end ``` Before: ``` ruby 3.3.0 (2023-12-25 revision 5124f9a) [arm64-darwin23] Warming up -------------------------------------- symbol 2.392M i/100ms dyn_symbol 2.440M i/100ms small_lit 2.155M i/100ms frozen_lit 2.010M i/100ms Calculating ------------------------------------- symbol 24.175M (± 1.7%) i/s - 122.002M in 5.048306s dyn_symbol 24.345M (± 1.6%) i/s - 122.019M in 5.013400s small_lit 21.252M (± 2.1%) i/s - 107.744M in 5.072042s frozen_lit 20.095M (± 1.3%) i/s - 100.489M in 5.001681s Comparison: symbol: 24174848.1 i/s dyn_symbol: 24345476.9 i/s - same-ish: difference falls within error small_lit: 21252403.2 i/s - 1.14x slower frozen_lit: 20094766.0 i/s - 1.20x slower ``` After: ``` ruby 3.4.0dev (2024-04-08T07:20:15Z interned-string-ha.. 76efed6) [arm64-darwin23] Warming up -------------------------------------- symbol 2.400M i/100ms dyn_symbol 2.405M i/100ms small_lit 2.308M i/100ms frozen_lit 2.268M i/100ms Calculating ------------------------------------- symbol 23.528M (± 6.9%) i/s - 117.584M in 5.033231s dyn_symbol 23.777M (± 4.7%) i/s - 120.231M in 5.071734s small_lit 23.066M (± 2.9%) i/s - 115.376M in 5.006947s frozen_lit 22.729M (± 1.1%) i/s - 115.693M in 5.090700s Comparison: symbol: 23527823.6 i/s dyn_symbol: 23776757.8 i/s - same-ish: difference falls within error small_lit: 23065535.3 i/s - same-ish: difference falls within error frozen_lit: 22729351.6 i/s - same-ish: difference falls within error ``` Co-Authored-By: Étienne Barrié <etienne.barrie@gmail.com>
1a17cf3
to
49c75af
Compare
Unclear how much worth it, it would be, but with embeded strings we often have some space left in the slot, which we can use to store the string Hash code.
It's probably only worth it for string literals, as they are the ones likely to be used as hash keys.
We could even ony do this for string literal actually used as hash keys, as seen by the compiler.
It is also unclear what the best layout would be. We currently store it right after the string terminator as to make it easy/fast to compute, but ideally it would be at a fixed position, however this means creating another
RString
union, which would complexify the code base significantly.It's also unclear to us whether trying to respect alignment would lead to a performance gain or not.
Before:
After:
cc @etiennebarrie