-
Notifications
You must be signed in to change notification settings - Fork 747
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Parser] Simplify the lexer interface #6319
Conversation
The lexer was previously an iterator over tokens, but that expressivity is not actually used in the parser. Instead, we have `input.h` that adapts the token iterator interface into an iterface that is actually useful. As a first step toward simplifying the lexer implementation to no longer be an iterator over tokens, update its interface by moving the adaptation from input.h to the lexer itself. This requires extensive changes to the lexer unit tests, which will not have to change further when we actually simplify the lexer implementation.
Current dependencies on/for this PR: This stack of pull requests is managed by Graphite. |
EXPECT_FALSE(Lexer("18446744073709551616"sv).takeI64()); | ||
|
||
EXPECT_FALSE(Lexer("+9223372036854775807"sv).takeU64()); | ||
EXPECT_EQ(Lexer("+9223372036854775807"sv).takeI64(), INT64_MAX); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am having trouble mapping these two lines to the old tests. Based on the constant, I looked at
{
Lexer lexer("+9223372036854775807"sv);
ASSERT_FALSE(lexer.empty());
Token expected{"+9223372036854775807"sv, IntTok{INT64_MAX, Pos}};
EXPECT_EQ(*lexer, expected);
}
But the first part looks different? Before we checked the lexer was not empty, and now we check that takeU64
is false (why is it false?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
takeU*
can never succeed when the token starts with +
or -
because unsigned numbers should not have signs. takeI*
, on the other hand, falls back to parsing the number as signed then reinterpreting it to be unsigned, so that succeeds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, and is that a new test compared to before? Or am I not reading the old code right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The old test was testing what happened before takeU*
or takeI*
would have been called. The behavior of takeU*
has not changed, but that's not what was being tested before. The successful tokenization part that was previously tested is still happening, but now it's an internal implementation detail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks!
Going to go ahead and land this despite the unrelated emscripten failure. |
The lexer was previously an iterator over tokens, but that expressivity is not actually used in the parser. Instead, we have `input.h` that adapts the token iterator interface into an iterface that is actually useful. As a first step toward simplifying the lexer implementation to no longer be an iterator over tokens, update its interface by moving the adaptation from input.h to the lexer itself. This requires extensive changes to the lexer unit tests, which will not have to change further when we actually simplify the lexer implementation.
The lexer was previously an iterator over tokens, but that expressivity is not
actually used in the parser. Instead, we have
input.h
that adapts the tokeniterator interface into an iterface that is actually useful.
As a first step toward simplifying the lexer implementation to no longer be an
iterator over tokens, update its interface by moving the adaptation from input.h
to the lexer itself. This requires extensive changes to the lexer unit tests,
which will not have to change further when we actually simplify the lexer
implementation.