Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can not analyze symbols with Chinese #53

Open
baby0o01999 opened this issue Aug 3, 2017 · 9 comments
Open

Can not analyze symbols with Chinese #53

baby0o01999 opened this issue Aug 3, 2017 · 9 comments
Labels
enhancement Request for functionality covering an entirely new use case
Milestone

Comments

@baby0o01999
Copy link

Can not analyze the Chinese function name, variable name, please add the analysis of Chinese function name and variable name support.
Luajit can support gbk or utf8 Chinese function name and variable name.

example
`
function 中文函数名(参数1,参数2)
local 中文变量 = "Chinese variable name"
end

@fstirlitz
Copy link
Owner

PUC Lua doesn't accept this code:

$ lua5.3
Lua 5.3.3  Copyright (C) 1994-2016 Lua.org, PUC-Rio
> function 中文函数名(参数1,参数2) end
stdin:1: <name> expected near '<\228>'
$ lua5.2 
Lua 5.2.4  Copyright (C) 1994-2015 Lua.org, PUC-Rio
> function 中文函数名(参数1,参数2) end
stdin:1: <name> expected near char(228)
$ lua5.1 
Lua 5.1.5  Copyright (C) 1994-2012 Lua.org, PUC-Rio
> function 中文函数名(参数1,参数2) end
stdin:1: '<name>' expected near '�'

Aparently LuaJIT allows any octet ≥ 128 inside identifiers. (The documentation only says 'UTF-8' characters are supported, but nothing actually checks the encoding for validity, and a comment in the source code suggests that other encodings are meant to be supported as well.) That at least makes it simple to implement; I worried we might have to import the Unicode character database to check character properties or something.

Well, sort of simple, because we parse Lua source code at the level of code points, not bytes, which brings back the conundrum I've had with interpreting string literals...

Given that it's an extension from PUC Lua, I'll probably implement this, but it will require being explicitly enabled by the user, like with the luaVersion option. Please note that this is not full LuaJIT support. For example, LuaJIT will accept code like this, unless LUAJIT_ENABLE_LUA52COMPAT is defined when compiling:

goto skip
local goto = print
goto "hello"
::skip::

Supporting code like this might be too tricky to be worth it, so I explicitly don't promise that.

@fstirlitz
Copy link
Owner

Implemented in 7172940. Consider it unstable, however. I might still revisit the encoding issue.

@ghost

This comment has been minimized.

@fstirlitz

This comment has been minimized.

@ghost

This comment has been minimized.

@ghost

This comment has been minimized.

@fstirlitz

This comment has been minimized.

Repository owner locked as too heated and limited conversation to collaborators May 20, 2018
Repository owner unlocked this conversation May 21, 2018
@fstirlitz fstirlitz added the enhancement Request for functionality covering an entirely new use case label Aug 2, 2019
@fstirlitz fstirlitz added this to the 0.3 milestone Oct 5, 2019
@fstirlitz fstirlitz mentioned this issue Oct 31, 2019
@fstirlitz
Copy link
Owner

I filed the encoding issue as #68. Another question that remains is whether to integrate this feature into the feature flags framework (i.e. the features object) and expose the latter directly in the API. I'm leaning towards 'yes'.

@fstirlitz
Copy link
Owner

Apparently Lua 5.4 will add this feature behind a compile-time flag: lua/lua@e0ab13c. Although it seems that unlike LuaJIT only (modern) UTF-8 names will be supported.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Request for functionality covering an entirely new use case
Projects
None yet
Development

No branches or pull requests

2 participants