-
Notifications
You must be signed in to change notification settings - Fork 66
Replace CSV Ragel parser by a hand-written parser copied from encoding/csv #275
Conversation
This allows csv.Reader to be limited by the worker pool
1ff6963
to
31ff030
Compare
/run-all-tests |
9d6f8e2
to
82c3067
Compare
82c3067
to
fa277ed
Compare
do we have a benchmark test for lightning, e.g. 10GB data, 3 TiKV, running |
Now. we did not have benchmark for lightning. will scheduler for it |
PTAL again @XuHuaiyu
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rest LGTM
no CI to make sure the static checker runs now
/run-all-tests tidb=release-3.1 pd=release-3.1 tikv=release-3.1 |
fail to start tidb? 🤔
|
/run-all-tests |
this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rest LGTM except #275 (comment)
@kennytm pls resolve conflicts |
/run-all-test |
/run-all-tests |
PTAL @3pointer |
What problem does this PR solve?
The Ragel-based CSV parser is much slower than the standard
encoding/csv
parser.As explained in #111, the parser was using Ragel to reuse existing framework for simplicity. However as more and more customer start to use Lightning with CSV input, this is the time we need to optimize the things.
What is changed and how it works?
The new parser was inspired by
encoding/csv
but almost nothing remained except therecordBuffer
/fieldIndexes
members to reduce the amount of allocations. We cannot reuseencoding/csv
because:encoding/csv
does not allow us to track the current read positionencoding/csv
does not recognize backslash-escaped fields (required for MySQL-generated CSV)encoding/csv
does not support disabling quotingImplementing these make the new parser still worse than
encoding/csv
, but much better than the original one.encoding/csv
So the new parser is 35% of the original parser and
encoding/csv
is 80% of the new parser. We can investigate how to squeeze out the remaining 20% later.Check List
Tests
Side effects
Related changes