-
-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ADIF import fails to read non-ASCII UTF-8 characters correctly #813
Comments
Question: Do system locale settings determine what charset Cloudlog uses to parse uploaded files? I'm running Cloudlog in a Docker container and haven't set any locale settings, so it's defaulting to POSIX locale. I'm also seeing that there's a charset setting (with default value of UTF-8) in Cloudlog config, which I'd assume to override any system locale settings... |
I've tried to change the locale to a UTF-8 one -- didn't fix the issue. Any ideas? |
@magicbug I may have found the issue in ADIF parsing (at least) on line: https://github.com/magicbug/Cloudlog/blob/master/application/libraries/Adif_parser.php#L163 UTF-8 strings can NOT be split into substrings reliably by index access like this: If I've understood PHP string correctly (not being a PHP developer), all indexed access to the data should use https://stackoverflow.com/questions/6315750/wrong-output-when-using-array-indexing-on-utf-8-string As multiple fields on a line in ADIF log may contain UTF-8 chars, it's probably necessary to use multibyte-compatible substr in all places where index access is used currently. The length of each field indicated by the ADIF format is the number of characters, not the number of bytes. |
Thanks for testing! However, I think all of the index references to the ADIF line content need to be changed to support multi-byte charsets. I'll test this and try to push out a PR. |
@magicbug @AndreasK79 There's now an initial attempt to fix this in PR #830 -- feel free to clean up the code if you wish to :) |
@mikaelnousiainen great work. I will test it out later. |
Merged into the code |
Describe the bug
ADIF import fails to read non-ASCII UTF-8 characters correctly. The issue I'm experiencing seems exactly like #321
Additionally, if the non-ASCII char ('ä' in this example) is the last character in the any field (e.g. comment), it results in invalid UTF-8 to be sent to the DB and the import fails with error:
To Reproduce
Steps to reproduce the behaviour:
Expected behaviour
All ADIF fields with UTF-8 characters will be imported correctly.
Desktop (please complete the following information):
Additional context
I've checked that the ADIF contains valid UTF-8 chars and that the MariaDB tables use the default charset of utf8mb4. Something else goes wrong in parsing the ADIF file.
I'm running the latest master, downloaded on Jan 11th.
The text was updated successfully, but these errors were encountered: