Refactoring lexer to treat all characters as UTF-8 #2309

tamaroning · 2023-06-19T01:11:58Z

Related to #2287

The lexer currently handles ASCII and non-ASCII characters differently, though Rust accepts UTF-8 source.
The main problem in our lexer is mix use of skip_codepoint_input() (1~4 byte skip) and skip_input(int n) (one byte skip) (also, peek_codepont_input() and peek_input(int n)) , which makes difficult to support Unicode such as identifiers, and whitespaces in our lexer simply.

To deal with this problem, we need

to modify peek_input(int n) and skip_input(int n) to return and skip a UTF-8 character,
- Refactor lexer to treat all input characters as UTF-8 #2307
to replace all use of peek_codepoint_input() and skip_codepoint_input() with peek_input and skip_input respectively,
- Remove unnecessary methods/fields of Rust::Lexer #2347
to remove get_codepoint_input_length() and current_char32 field in Lexer,
- Remove unnecessary methods/fields of Rust::Lexer #2347
to check if the input source is valid as UTF-8,
- Refactor lexer to treat all input characters as UTF-8 #2307

The text was updated successfully, but these errors were encountered:

tamaroning · 2023-07-04T17:13:09Z

All tasks are completed.

tamaroning mentioned this issue Jun 19, 2023

Unicode support #2287

Open

15 tasks

CohenArthur added the enhancement label Jun 22, 2023

This was referenced Jun 26, 2023

Refactor lexer to treat all input characters as UTF-8 #2307

Merged

Remove unnecessary methods/fields of Rust::Lexer #2347

Merged

tamaroning closed this as completed Jul 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactoring lexer to treat all characters as UTF-8 #2309

Refactoring lexer to treat all characters as UTF-8 #2309

tamaroning commented Jun 19, 2023 •

edited

Loading

tamaroning commented Jul 4, 2023

Refactoring lexer to treat all characters as UTF-8 #2309

Refactoring lexer to treat all characters as UTF-8 #2309

Comments

tamaroning commented Jun 19, 2023 • edited Loading

tamaroning commented Jul 4, 2023

tamaroning commented Jun 19, 2023 •

edited

Loading