Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
perf: fix record copying performance bug
If there happens to be an abnormally long record in a CSV file---where the rest are short---this abnormally long record ends up causing a performance loss while parsing subsequent records. Such a thing is usually caused by a buffer being expanded, and then that expanded buffer leading to extra cost that shouldn't be paid when parsing smaller records. Indeed, this case is no exception. In this case, the standard record iterators use an internal record for copying CSV data into, and then clone this record as appropriate it the iterator's `next` method. In this way, that record's memory can be reused. This is a bit better than just allocating a fresh buffer every time, since generally speaking, the length of each CSV row is usually pretty similar to the length of prior rows. However, in this case, when we come across an exceptionally long record, the internal record is expanded to handle that record. When that internal record is clone to give back to the caller, the record *and* its excess capacity is also cloned. In the case of an abnormally long record, this ends up copying that extra excess capacity for all subsequent rows. This easily explains the performance bug. So to fix it, we introduce a new private method that lets us copy a record *without* excess capacity. (We could implement `Clone` more intelligently, but I'm not sure whether it's appropriate to drop excess capacity in a `Clone` impl. That might be unexpected.) We then use this new method in the iterators instead of standard `clone`. In the case where there is no abnormally long records, this shouldn't have any impact. Fixes #227
- Loading branch information