[Parser] Simplify the lexer interface #6319

tlively · 2024-02-17T01:55:20Z

The lexer was previously an iterator over tokens, but that expressivity is not
actually used in the parser. Instead, we have input.h that adapts the token
iterator interface into an iterface that is actually useful.

As a first step toward simplifying the lexer implementation to no longer be an
iterator over tokens, update its interface by moving the adaptation from input.h
to the lexer itself. This requires extensive changes to the lexer unit tests,
which will not have to change further when we actually simplify the lexer
implementation.

The lexer was previously an iterator over tokens, but that expressivity is not actually used in the parser. Instead, we have `input.h` that adapts the token iterator interface into an iterface that is actually useful. As a first step toward simplifying the lexer implementation to no longer be an iterator over tokens, update its interface by moving the adaptation from input.h to the lexer itself. This requires extensive changes to the lexer unit tests, which will not have to change further when we actually simplify the lexer implementation.

tlively · 2024-02-17T01:55:35Z

Current dependencies on/for this PR:

[Parser] Simplify the lexer interface #6319 👈
main

This stack of pull requests is managed by Graphite.

kripken · 2024-02-20T18:35:17Z

test/gtest/wat-lexer.cpp

+  EXPECT_FALSE(Lexer("18446744073709551616"sv).takeI64());
+
+  EXPECT_FALSE(Lexer("+9223372036854775807"sv).takeU64());
+  EXPECT_EQ(Lexer("+9223372036854775807"sv).takeI64(), INT64_MAX);


I am having trouble mapping these two lines to the old tests. Based on the constant, I looked at

{ Lexer lexer("+9223372036854775807"sv); ASSERT_FALSE(lexer.empty()); Token expected{"+9223372036854775807"sv, IntTok{INT64_MAX, Pos}}; EXPECT_EQ(*lexer, expected); }

But the first part looks different? Before we checked the lexer was not empty, and now we check that takeU64 is false (why is it false?)

takeU* can never succeed when the token starts with + or - because unsigned numbers should not have signs. takeI*, on the other hand, falls back to parsing the number as signed then reinterpreting it to be unsigned, so that succeeds.

Thanks, and is that a new test compared to before? Or am I not reading the old code right?

The old test was testing what happened before takeU* or takeI* would have been called. The behavior of takeU* has not changed, but that's not what was being tested before. The successful tokenization part that was previously tested is still happening, but now it's an internal implementation detail.

I see, thanks!

tlively · 2024-02-20T21:08:27Z

Going to go ahead and land this despite the unrelated emscripten failure.

The lexer was previously an iterator over tokens, but that expressivity is not actually used in the parser. Instead, we have `input.h` that adapts the token iterator interface into an iterface that is actually useful. As a first step toward simplifying the lexer implementation to no longer be an iterator over tokens, update its interface by moving the adaptation from input.h to the lexer itself. This requires extensive changes to the lexer unit tests, which will not have to change further when we actually simplify the lexer implementation.

tlively requested a review from kripken February 17, 2024 01:55

kripken reviewed Feb 20, 2024

View reviewed changes

kripken approved these changes Feb 20, 2024

View reviewed changes

tlively merged commit c0cdd26 into main Feb 20, 2024
14 of 15 checks passed

tlively deleted the parser-simplify-lexer-interface branch February 20, 2024 21:08

gkdn mentioned this pull request Aug 31, 2024

stringconsts gkdn/binaryen#1

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Parser] Simplify the lexer interface #6319

[Parser] Simplify the lexer interface #6319

tlively commented Feb 17, 2024

tlively commented Feb 17, 2024 •

edited

Loading

kripken Feb 20, 2024

tlively Feb 20, 2024 •

edited

Loading

kripken Feb 20, 2024

tlively Feb 20, 2024 •

edited

Loading

kripken Feb 20, 2024

tlively commented Feb 20, 2024

[Parser] Simplify the lexer interface #6319

[Parser] Simplify the lexer interface #6319

Conversation

tlively commented Feb 17, 2024

tlively commented Feb 17, 2024 • edited Loading

kripken Feb 20, 2024

Choose a reason for hiding this comment

tlively Feb 20, 2024 • edited Loading

Choose a reason for hiding this comment

kripken Feb 20, 2024

Choose a reason for hiding this comment

tlively Feb 20, 2024 • edited Loading

Choose a reason for hiding this comment

kripken Feb 20, 2024

Choose a reason for hiding this comment

tlively commented Feb 20, 2024

tlively commented Feb 17, 2024 •

edited

Loading

tlively Feb 20, 2024 •

edited

Loading

tlively Feb 20, 2024 •

edited

Loading