Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

formatter_csv: Improve the performance. 2x faster #2529

Merged
merged 1 commit into from
Jul 30, 2019

Conversation

repeatedly
Copy link
Member

Signed-off-by: Masahiro Nakagawa repeatedly@gmail.com

Which issue(s) this PR fixes:
None

What this PR does / why we need it:
Improve the performance by avoiding creating CSV object per format call.
In ruby 2.6.3, 2x faster than before.

Warming up --------------------------------------
                 now     4.627k i/100ms
                 new     9.692k i/100ms
Calculating -------------------------------------
                 now     49.484k (± 2.1%) i/s -    249.858k in   5.051491s
                 new     98.117k (± 8.4%) i/s -    494.292k in   5.091679s
require 'benchmark/ips'
require 'csv'

class CF
  def initialize
    @fields = ["key1","key2","ke,y3","ke y4","key5","key6","key7","key8"]
    @generate_opts1 = {col_sep: ',', force_quotes: true}
    @generate_opts2 = {col_sep: ',', force_quotes: true, headers: @fields,
                       row_sep: @add_newline ? :auto : "".force_encoding(Encoding::ASCII_8BIT)}

    @cache = {}
  end

  def format1(tag, time, record)
    row = @fields.map do |key|
      record[key]
    end
    line = CSV.generate_line(row, @generate_opts1)
    line.chomp! unless @add_newline
    line
  end

  def format2(tag, time, record)
    csv = (@cache[Thread.current] ||= CSV.new("".force_encoding(Encoding::UTF_8), @generate_opts2))
    line = (csv << record).string.dup
    csv.rewind
    csv.truncate(0)
    line
  end
end

keys = ["key1","key2","ke,y3","ke y4","key5","key6","key7","key8"]
record = {}
keys.each { |key|
  record[key] = "valueeeeee1"
}

cf = CF.new

Benchmark.ips do |x|
  x.report('now') do
    cf.format1(nil, nil, record)
  end

  x.report('new') do
    cf.format2(nil, nil, record)
  end
end

Docs Changes:
No need

Release Note:
Same as title.

Avoid creating CSV object per format call.

Signed-off-by: Masahiro Nakagawa <repeatedly@gmail.com>
@repeatedly repeatedly added the enhancement Feature request or improve operations label Jul 29, 2019
@repeatedly repeatedly requested a review from ganmacs July 29, 2019 05:17
@repeatedly repeatedly self-assigned this Jul 29, 2019
@repeatedly repeatedly merged commit 7640531 into master Jul 30, 2019
@repeatedly repeatedly deleted the faster-formatter_csv branch July 30, 2019 11:50
284km added a commit to 284km/csv that referenced this pull request Sep 18, 2019
I'm still thinking...

I used benchmark script as below:
fluent/fluentd#2529

Warming up --------------------------------------
                 now     5.553k i/100ms
                 new    10.626k i/100ms
            instance     9.009k i/100ms
Calculating -------------------------------------
                 now     57.255k (± 4.1%) i/s -    288.756k in   5.051981s
                 new    114.090k (± 7.1%) i/s -    573.804k in   5.062333s
            instance     95.062k (± 4.1%) i/s -    477.477k in   5.031413s
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Feature request or improve operations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants