You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
is_person() in name_utils.py will return False if a name string contains all-CJK characters. At the time I wrote it, it was done this way because the name checkers like ProbablePeople can't handle CJK. However, it's obviously wrong if the string really is a human name.
The text was updated successfully, but these errors were encountered:
A partial fix is now in the dev branch and will be in the upcoming 1.3.0 release. The new implementation of is_person() is not very accurate when it comes to names in CJK scripts, but it is still better than the current situation (which is that it always returns False for CJK names).
Solving this problem properly turns out to be very difficult. I wish I could do something better than the current weak, home-grown heuristics. Unfortunately, this appears to be a research-grade problem that no one has solved. Even the best AI systems today can't reliable tell you if, say, a given 1-3 character sequence in Chinese is the name of a person.
The current solution may be as good as we can get for now. I'm going to close this issue because it is unlikely that I can devote more time on this matter.
This now attempts to make is_person() handle names written in Chinese,
Japanese and Korean scripts. It uses multiple heuristics to do this,
and it is not very accurate right now. However, it's still better than
it was before, because before, is_person() would simply return False
for name written in CJK scripts.
is_person()
inname_utils.py
will returnFalse
if a name string contains all-CJK characters. At the time I wrote it, it was done this way because the name checkers like ProbablePeople can't handle CJK. However, it's obviously wrong if the string really is a human name.The text was updated successfully, but these errors were encountered: