-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
invalid-name does not recognize identifiers with non-ASCII characters #2725
Comments
Thanks for the report. if I understand right, using the |
Hi, and thank you for the quick reply.
To be honest I’m not really interested in using my own regular expressions. What would be great is for pylint’s default verification of snake case (or any other supported style) to accept all valid Python 3 identifiers in snake case. At the moment it fails to do that.
Well, that’s exactly what I’m arguing for: that the current, default regular expressions in checkers/base.py are insufficient, insofar as they don’t cover the totality of Python 3 allowed syntax. I can see a case for not changing them. But this bug report is not about having regex available (that’d be a separate topic), but about having pylint comprehensively support all valid identifiers by default. Many thanks for considering. |
(Just to be clear, I’ll perfectly understand if you prefer to close this issue with “wontfix”, I just wanted for it to exist and for its scope to be clear. I just do not believe users of Unicode identifiers should be left with writing up their own regular expressions as their only option.) |
@dato This makes sense, I think we should do it if it's possible via the |
Maybe I'm out of context here but by default >>> import re
>>> re.search("\w*_\w*", "validar_contraseña")
<re.Match object; span=(0, 18), match='validar_contraseña'>
>>> re.search("\w*_\w*", "validar_contraseña", re.ASCII)
<re.Match object; span=(0, 16), match='validar_contrase'> imho, user should be discouraged from using non-ascii names. Some chars are very similar which may cause issues later. |
That's a very good point, thank you! The current checkers don't use |
|
I agree with @gyermolenko that we should discourage users from non-ASCII identifiers. At the same time, I don't believe that should happen with
|
I am also a teacher and use some accented identifiers in my code with my students. The proposed two-fold solution by @PCManticore makes perfect sense to me. Hope that it will be implemented soon. Thanks. |
Thanks for the comment @frederickjeanguerin We'll get to this issue eventually but if you or any of your students have any free time to work on a patch, that would speed things up considerably (there are almost 500 issues on this tracker so might take us a while to fix it) |
This has been implemented by #3409 which adds support for Unicode names and adds a new check |
How do we enable this new non-ascii-name? |
@arisliang It will be enabled by default once we'll launch 2.5 to the public (which should happen this week) |
In Python 3, identifiers can include non-ASCII characters. See Identifiers and keywords in the Python Reference.
Roughly speaking, the definition of letters is extended from
[A-Za-z]
to all Unicode letter categories. The standardre
module, however, does not support Unicode character properties likeLu
,Ll
, etc; and I’m not sure its standard Unicode support will be enough.Alternatives include the drop-in, compatible regex module (which supports Unicode properties with
\p{}
, or manually compiling a list of character ranges to use (which gets very ugly very quickly).Steps to reproduce
Create an UTF-8 file with, for example, the following contents:
(N.B.: I don’t write code with non-ASCII characters, but my students do.)
Run pylint3 on it.
Current behavior
(I encounter this when I run pylint on my students’ code.)
Expected behavior
(This is what I want their code to look like. 😄)
pylint --version output
Tested with:
Many thanks in advance.
The text was updated successfully, but these errors were encountered: