Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring lexer to treat all characters as UTF-8 #2309

Closed
4 tasks done
tamaroning opened this issue Jun 19, 2023 · 1 comment
Closed
4 tasks done

Refactoring lexer to treat all characters as UTF-8 #2309

tamaroning opened this issue Jun 19, 2023 · 1 comment

Comments

@tamaroning
Copy link
Contributor

tamaroning commented Jun 19, 2023

Related to #2287

The lexer currently handles ASCII and non-ASCII characters differently, though Rust accepts UTF-8 source.
The main problem in our lexer is mix use of skip_codepoint_input() (1~4 byte skip) and skip_input(int n) (one byte skip) (also, peek_codepont_input() and peek_input(int n)) , which makes difficult to support Unicode such as identifiers, and whitespaces in our lexer simply.

To deal with this problem, we need

@tamaroning
Copy link
Contributor Author

All tasks are completed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

No branches or pull requests

2 participants