-
-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix gen lexer word boundary, case insensitive, and literal matching cases (plus conformance tests) #274
Fix gen lexer word boundary, case insensitive, and literal matching cases (plus conformance tests) #274
Conversation
Also add conformance tests for these cases
Upon further inspection, the previous iteration of the word boundary conformance test was failing because case insensitive matching wasn't implemented. Updated to fix case insensitive matching and add new tests for this case. And then changed the word boundary test to not rely on case insensitive matching. |
Also add conformance test for this case
If \b is used at start of pattern, it could match right before the current position, rather than right after. Check both cases.
Identified and fixed another edge case where literal matches were not working if they were at exactly the end of the string. Also identified and fixed the issue with matching |
\b
if n == 1 { | ||
fmt.Fprintf(w, "if p < len(s) && s[p] == %q {\n", re.Rune[0]) | ||
if re.Flags&syntax.FoldCase != 0 { | ||
fmt.Fprintf(w, "if p+%d <= len(s) && strings.EqualFold(s[p:p+%d], %q) {\n", n, n, string(re.Rune)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM, just a couple of suggestions.
@@ -18,6 +18,9 @@ var conformanceLexer = lexer.MustStateful(lexer.Rules{ | |||
"Root": { | |||
{"String", `"`, lexer.Push("String")}, | |||
// {"Heredoc", `<<(\w+)`, lexer.Push("Heredoc")}, | |||
{"LiteralTest", `LITTEST:`, lexer.Push("LiteralTest")}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah nice, I like this pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind converting the existing tests to use this pattern too?
{"LITKeyword", "SELECT"}, | ||
}}, | ||
{"LiteralMixed", `LITTEST:hello ONE test LIKE world`, []token{ | ||
{"LiteralTest", "LITTEST:"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could make the test harness strip these automatically? ie. the first token iff it ends with TEST:
@@ -117,7 +200,7 @@ func TestLexerConformanceGenerated(t *testing.T) { | |||
args := []string{"test", "-run", "TestLexerConformanceGeneratedInternal", "-tags", "generated"} | |||
// Propagate test flags. | |||
flag.CommandLine.VisitAll(func(f *flag.Flag) { | |||
if f.Value.String() != f.DefValue { | |||
if f.Value.String() != f.DefValue && f.Name != "test.run" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch 🤦♂️
@alecthomas - I did some cleanup, see what you think. Thanks! |
Awesome, thanks :) |
@alecthomas - I wanted to get your feedback early on the approach to add a specialized test to the conformance test suite. It uses a special token (e.g.,
WBTEST:
) to push into a new lexer state that then might have different rules vs the rest of the grammar. This would allow for testing different areas of the lexer without having to add more and more rules to the Root state (which might start interacting with each other over time as the number of conformance test cases grow and the rules get more complex).Opening as a draft to get your feedback on the approach. This demonstrates how the generated lexer currently doesn't support the word boundary
\b
regex properly. (Will next see what might be needed to fix this in the lexer generation.)This also tweaks
TestLexerConformanceGenerated
so that if you run this test function from VSCode IDE it won't just hang forever (the command that eventually gets run before this change wasgo test -run TestLexerConformanceGeneratedInternal -tags generated '-test.run=^TestLexerConformanceGenerated$'
, which confuses go and causes it to hang forever until Ctrl+C is pressed).