Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add hash method to metadata to speed up comparing it in a hash key #2560

Merged
merged 1 commit into from
Aug 21, 2019

Conversation

ganmacs
Copy link
Member

@ganmacs ganmacs commented Aug 14, 2019

Which issue(s) this PR fixes:

no

What this PR does / why we need it:

Object#hash is called when using an object as the hash key or reference value from a hash object with the object.
Current Struct's implementation is comparing all inner variables.
https://github.com/ruby/ruby/blob/0623e2b7cc621b1733a760b72af246b06c30cf96/struct.c#L1200-L1203

This impl is a bit redundant. Especially in fluentd, it can be overhead.
e.g. Output#handle_stream_simple creates a hash object using an object as a key. this method is called per emit(one of the most called methods in fluentd).

benchmark

require 'benchmark/ips'

A = Struct.new(:l, :c, :r)

B = Struct.new(:l, :c, :r) do
  def hash
    l
  end
end

a = A.new(1, 2, 3)
b = B.new(1, 2, 3)

Benchmark.ips do |x|
  x.report("not hash defined") do
    { a => 1 }
  end

  x.report("hash defined") do
    { b => 1 }
  end
end

aa = { a => 1 }
bb = { b => 1 }

Benchmark.ips do |x|
  x.report("not hash defined ref") do
    aa[a]
  end

  x.report("hash defined ref") do
    bb[b]
  end
end

a2 = A.new(1, 2, 4)
b2 = B.new(1, 2, 4)

Benchmark.ips do |x|
  x.report("not hash defined ref not exist") do
    aa[a2]
  end

  x.report("hash defined ref not exist") do
    bb[b2]
  end
end
Warming up --------------------------------------
    not hash defined    95.271k i/100ms
        hash defined   213.133k i/100ms
Calculating -------------------------------------
    not hash defined      1.121M (± 6.8%) i/s -      5.621M in   5.040657s
        hash defined      3.320M (± 6.6%) i/s -     16.624M in   5.030596s
Warming up --------------------------------------
not hash defined ref    90.976k i/100ms
    hash defined ref   212.802k i/100ms
Calculating -------------------------------------
not hash defined ref      1.207M (± 6.0%) i/s -      6.095M in   5.071473s
    hash defined ref      4.270M (± 2.0%) i/s -     21.493M in   5.035820s
Warming up --------------------------------------
not hash defined ref not exist
                        96.979k i/100ms
hash defined ref not exist
                       166.564k i/100ms
Calculating -------------------------------------
not hash defined ref not exist
                          1.215M (± 2.2%) i/s -      6.110M in   5.029940s
hash defined ref not exist
                          2.463M (± 1.8%) i/s -     12.326M in   5.005410s

Docs Changes:

no need

Release Note:

same as title

@ganmacs ganmacs force-pushed the add-hash-to-metadata branch from be4f646 to 079803e Compare August 14, 2019 07:02
@ganmacs ganmacs added the enhancement Feature request or improve operations label Aug 14, 2019
# This means that this class is one of the most called object in Fluentd.
# See https://github.com/fluent/fluentd/pull/2560
def hash
timekey.object_id
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

timekey is sometimes nil, right?
self.object_id is not fit?

Copy link
Member

@repeatedly repeatedly Aug 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, self.object_id is wong. It doesn't work well.

@ganmacs ganmacs self-assigned this Aug 16, 2019
@repeatedly
Copy link
Member

timekey.object_id is safe for existing use cases?
metadata is used in buffer for hash key.

@ganmacs
Copy link
Member Author

ganmacs commented Aug 19, 2019

I think so. the following code returns the same result.

A = Struct.new(:l, :c, :r)

B = Struct.new(:l, :c, :r) do
  def hash
    l.object_id
  end
end

a = A.new(nil, 2, 3)
a2 = A.new(nil, 2, 3)
a3 = A.new(nil, 2, 4)
a4 = A.new(1, 2, 4)
a5 = A.new(1, 2, 4)

b = B.new(nil, 2, 3)
b2 = B.new(nil, 2, 3)
b3 = B.new(nil, 2, 4)
b4 = B.new(1, 2, 4)
b5 = B.new(1, 2, 4)

aa = { a => 1, a2 => 2, a3 => 3 }
aa2 = { a => 1 }
bb = { b => 1, b2 => 2, b3 => 3 }
bb2 = { b => 1 }

p (a == a2)  == (b == b2)
p (a3 == a2) == (b3 == b2)
p aa[a]      == bb[b]
p aa[a2]     == bb[b2]
p aa[a3]     == bb[b3]
p aa2[a]     == bb2[b]
p aa2[a2]    == bb2[b2]
p aa2[a3]    == bb2[b3]
p (a4 == a5) == (b4 == b5)

@repeatedly repeatedly merged commit a9ba3f8 into fluent:master Aug 21, 2019
@repeatedly
Copy link
Member

If this has a problem with exsiting case, we will revert it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Feature request or improve operations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants