-
Notifications
You must be signed in to change notification settings - Fork 17.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
encoding/csv: incorrect line:column for error #22352
Comments
I'm planning on just deprecating the |
Change https://golang.org/cl/72050 mentions this issue: |
I'm kicking this over to |
I think it would be unfortunate to deprecate this field. Being told a " is wrong in a line with 23 "'s might be difficult to find. |
As commented on the CL: Maybe it would make sense to replace (deprecate in favor of) ParseError.Column with a new field that contains the current record number (basically the current len(r.fieldIndexes)+1)? Unlike the column, the field number should always be meaningful, even for ErrFieldCount (where it would be the actual len of the record). |
Knowing the field number is nice, but it still requires quite a bit of parsing for the human reading the error. For CSV files with few fields and simple values this is not an issue, but for CSV files with a large number of fields or longish fields this probably doesn't help a lot. |
When debugging there are several pieces of information that is helpful:
Note that all 3 pieces of information are distinct. In go1.9 and before, it used approach 3. However, with badly quoted strings, this is untenable since what the parser considers the problem is sometimes very far from what a human would consider the problem (which is the start of the record or field). As a result of #19019, the error was changed to be 1, in which case the column number is already senseless. A better error message would combine information in 1 and 3. Imagine the following:
Thoughts on this? While information in 2 is useful, it is also not performance friendly to keep that information around. |
I agree that ParseError.Column is nice and somewhat I would expect from an error when reading a CSV file, but I also agree with @dsnet that the error he got makes no sense and it's not the only case where Column doesn't make sense (ErrFieldCount comes to my mind, at least it't not really what most people would expect from my experience) Even if we keep ParseError.Column I'd like to see a field with the current field index for the simpler cases where you don't have that many columns and/or long fields, ParseError.Column isn't that helpful or you are just trying to find the error in a tool like Excel (which many people do) where you can more easily jump to a specific column (although this wouldn't work for all errors). |
Didn't see @dsnet's comment pop up before submitting mine, so here I go again: I like that idea. CL 52830 isn't even released yet so maybe we could just change it back to report the error line and add two new fields StartLine and StartColumn. This way existing uses of ParseError Line and Column wouldn't change at all from Go 1.9 to Go 1.10 (unless I'm missing something) and we would still get the information and nice error message. |
Change https://golang.org/cl/72150 mentions this issue: |
The Reader implementation is slow because it operates on a rune-by-rune basis via bufio.Reader.ReadRune. We speed this up by operating on entire lines that we read from bufio.Reader.ReadSlice. In order to ensure that we read the full line, we augment ReadSlice in our Reader.readLine method to automatically expand the slice if bufio.ErrBufferFull is every hit. This change happens to fix #19410 because it no longer relies on rune-by-rune parsing and only searches for the relevant delimiter rune. In order to keep column accounting simple and consistent, this change reverts parts of CL 52830. This CL is an alternative to CL 36270 and builds on some of the ideas from that change by Diogo Pinela. name old time/op new time/op delta Read-8 3.12µs ± 1% 2.54µs ± 2% -18.76% (p=0.000 n=10+9) ReadWithFieldsPerRecord-8 3.12µs ± 1% 2.53µs ± 1% -18.91% (p=0.000 n=9+9) ReadWithoutFieldsPerRecord-8 3.13µs ± 0% 2.57µs ± 3% -18.07% (p=0.000 n=10+10) ReadLargeFields-8 52.3µs ± 1% 5.3µs ± 2% -89.93% (p=0.000 n=10+9) ReadReuseRecord-8 2.05µs ± 1% 1.40µs ± 1% -31.48% (p=0.000 n=10+9) ReadReuseRecordWithFieldsPerRecord-8 2.05µs ± 1% 1.41µs ± 0% -31.03% (p=0.000 n=10+9) ReadReuseRecordWithoutFieldsPerRecord-8 2.06µs ± 1% 1.40µs ± 1% -31.70% (p=0.000 n=9+10) ReadReuseRecordLargeFields-8 50.9µs ± 0% 4.1µs ± 3% -92.01% (p=0.000 n=10+10) name old alloc/op new alloc/op Read-8 664B ± 0% 664B ± 0% ReadWithFieldsPerRecord-8 664B ± 0% 664B ± 0% ReadWithoutFieldsPerRecord-8 664B ± 0% 664B ± 0% ReadLargeFields-8 3.94kB ± 0% 3.94kB ± 0% ReadReuseRecord-8 24.0B ± 0% 24.0B ± 0% ReadReuseRecordWithFieldsPerRecord-8 24.0B ± 0% 24.0B ± 0% ReadReuseRecordWithoutFieldsPerRecord-8 24.0B ± 0% 24.0B ± 0% ReadReuseRecordLargeFields-8 2.98kB ± 0% 2.98kB ± 0% name old allocs/op new allocs/op Read-8 18.0 ± 0% 18.0 ± 0% ReadWithFieldsPerRecord-8 18.0 ± 0% 18.0 ± 0% ReadWithoutFieldsPerRecord-8 18.0 ± 0% 18.0 ± 0% ReadLargeFields-8 24.0 ± 0% 24.0 ± 0% ReadReuseRecord-8 8.00 ± 0% 8.00 ± 0% ReadReuseRecordWithFieldsPerRecord-8 8.00 ± 0% 8.00 ± 0% ReadReuseRecordWithoutFieldsPerRecord-8 8.00 ± 0% 8.00 ± 0% ReadReuseRecordLargeFields-8 12.0 ± 0% 12.0 ± 0% Updates #22352 Updates #19019 Fixes #16791 Fixes #19410 Change-Id: I31c27cfcc56880e6abac262f36c947179b550bbf Reviewed-on: https://go-review.googlesource.com/72150 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: Joe Tsai <thebrokentoaster@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
Change https://golang.org/cl/72310 mentions this issue: |
Consider the following snippet:
On tip, this prints
line 2, column 189: extraneous " in field
. Notice that the column number is absolutely bogus? There is no column 189 on line 2. This is a result of #19019 where reporting the start of the problematic line is more useful than far away when we fail to find the matching quote.The solution is to either simply report 0 as the column to signify the start of the line, or to report the column of where the bad string actually started.
\cc @nussjustin
The text was updated successfully, but these errors were encountered: