-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect parsing of CSV file. Locale is ignored for some columns #607
Comments
Ok, this is interesting. Here's the culprit:
This code throws an exception. Because in Estonian locale a different minus sign is used. |
Note for future: let's check how it's handled in R dplyr and maybe use it as a reference. I think they might be good at interpretation of all kinds of data |
Actually it makes sense that the locale parameter also looks at minus signs etc. Just have a look at Java's DecimalFormatSymbols, it includes the zero digit, grouping separator, decimal separator, per mille sign, percent char, NaN, minus sign, monetary decimal separator, exponent separator, etc... The decimal character is just the tip of the iceberg really. |
fixed with |
I have encountered a funny bug when parsing a CSV file. The floating point numbers in that file are specified with a comma
,
instead of dot.
, and this should be a subject of specifying the correct locale. However, it seems that Locale is respected for one column and ignored for the other. For instance, the comma is ignored for the number68,83
, and the result is6883
.CSV file: https://github.com/antonarhipov/kotlin-sandbox/blob/master/energy-consumption.csv
Notebook: https://github.com/antonarhipov/kotlin-sandbox/blob/master/demo.ipynb
The text was updated successfully, but these errors were encountered: