-
Notifications
You must be signed in to change notification settings - Fork 211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
\r\n
line separator isn't processed correctly if it falls on buffer end
#150
Comments
This might be related to issue #146. We don't currently know the fix, and no one has time set out to investigate right now. So if @alexeits or @joeldouglass have time to make a fix that'd be awesome. |
@dustinsmith1024: I think a fix might require configuring the newline explicitly, which means an API change like adding another |
Hmm, could we just throw a row away if its matches '\n'? Just noting where the delimiters are: |
I suppose we could. Although it might break existing applications that use just |
Yea, I am fine with bumping the major version if we cannot think of another fix. @doug-martin or @aheuermann might have some ideas. |
FWIW making the optional row delimiter configuration explicit will preserve backward compatibility. |
For this case, what would you configure as the new delimiter? Isn't |
Correct. At the same time if |
Ahh got it. I guess what I don't like is anyone using |
I like the idea of providing a manual row delimiter override, or trying to detect the row delimiter if it is not specified. node-csv-parse does something like this, though I'm not familiar with the fast-csv codebase, so I'm not sure if it could be adapted. |
This may also be not fully backward-compatible. For instance, if there are messy CSV files out there which use different delimiters in different rows fast-csv would currently process them just fine. Auto-detecting and "locking" the delimiter may break processing of such files. Probably not a very important use case but it would still warrant a bump in major. |
- Add a test for issue C2FO#150 to specify the expected behavior - Mark it `skip` pending implementation
- Add a test for CRLF split between two buffers (a.k.a issue C2FO#150) to specify the expected behavior - Mark it `skip` pending implementation
- Modify existing tests for `\r` row delimiter to specify the behavior in case of CR vs.CRLF ambiguity issue C2FO#150 - Mark them `skip` pending implementation
* Test issue 150 - Add a test for issue #150 to specify the expected behavior - Mark it `skip` pending implementation * Test split CRLF - Add a test for CRLF split between two buffers (a.k.a issue #150) to specify the expected behavior - Mark it `skip` pending implementation * Test ambiguous CR - Modify existing tests for `\r` row delimiter to specify the behavior in case of CR vs.CRLF ambiguity issue #150 - Mark them `skip` pending implementation * Keep the line if a new line is ambiguous Modify the parser to - parse CRLF as a single token - keep the current line unparsed if it ends in CR and there's more data This solves the issues #146 and #150 by ensuring that CRLF split by a buffer boundary doesn't get treated as two row delimiters CR+LF * v2.2.0
Given a CSV file with lines separated by
\r\n
when a stream buffer end falls between
\r
and\n
the separated
\n
gets processed as an empty rowExample:
Output
The text was updated successfully, but these errors were encountered: