Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libpostal logs "invalid UTF-8" warning for a string having "\0" or "\u0000" and stays in waiting state #36

Open
myasirkhan opened this issue Jan 6, 2020 · 1 comment

Comments

@myasirkhan
Copy link

Using the jpostal, if I call jpostal parseaddress like:

AddressParser.getInstance().parseAddress("Rue du Médecin-Colonel Calbairac Toulouse France\u0000")

I am seeing this warning logged

WARN  invalid UTF-8
   at transliterate (transliterate.c:791) errno: No such file or directory
WARN  invalid UTF-8
   at transliterate (transliterate.c:791) errno: No such file or directory
WARN  invalid UTF-8
   at transliterate (transliterate.c:791) errno: No such file or directory
WARN  invalid UTF-8
   at transliterate (transliterate.c:791) errno: No such file or directory

And the thread remains in waiting state.. This happens only when the address have \u0000 or (simple \0) character in it. Simplest solution seems to not send \0 character or replace it before calling parseAddress...

@wboult
Copy link

wboult commented Jul 26, 2020

@myasirkhan I hit this too, I've stolen the above example in some of the tests for a PR I just created

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants