-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix expected-indentation errors with end-of-line comments #4418
Conversation
df4b78d
to
834a6ce
Compare
) { | ||
continue; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, we may be able to get away with checking if this is the first token in the line? In theory that should also work. It's arguably more reliant on details of the lexer, but it's also making the expectation more explicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we fix this in the lexer instead? It feels odd that the newline is conditional on the comment placement. I would expect that a comment is always followed by a newline.
Nit. You can match directly on the Tok. Using TokenKind has some overhead that is only worth it when there are many reads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly not sure. \cc @youknowone - do you know, if it's intentional that we don't emit a newline when a comment is at start-of-line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may not actually be a bug, I'd have to compare to CPython.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like CPython does emit a non-logical newline, even for entirely empty lines, unlike RustPython.
RustPython also has this test:
#[test]
fn test_logical_newline_line_comment() {
let source = "#Hello\n#World";
let tokens = lex_source(source);
assert_eq!(
tokens,
vec![
Tok::Comment("#Hello".to_owned()),
// tokenize.py does put an NL here...
Tok::Comment("#World".to_owned()),
// ... and here, but doesn't seem very useful.
]
);
}
@youknowone - Do you know if this is intentional, or should I fix it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put up RustPython/Parser#27. (We'll still need code changes here to fix this bug, once that merges.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! Makes sense. Is the change to remove the explicit Comment
handling and only rely on the presence of Newline
and NonLogicalNewline
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! I don't think RustPython intended to have different behavior to CPython.
I will look in the patch.
0674681
to
752f16c
Compare
834a6ce
to
3685a92
Compare
PR Check ResultsBenchmarkLinux
Windows
|
I'm guessing that might be a real performance regression? Although is it comparing against |
) { | ||
continue; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! Makes sense. Is the change to remove the explicit Comment
handling and only rely on the presence of Newline
and NonLogicalNewline
?
I think it compares against |
3685a92
to
715c92f
Compare
752f16c
to
0827723
Compare
Ugh this got confused. |
Because I tried to swap the upstreams between this and |
Re-opened as #4438. |
Summary
Given code like:
The token stream will contain a comment, followed by a newline. At present, we end up creating a separate logical line for just the newline, which throws off our logic around expected-indentation rules, which needs to look back at the previous newline to determine the appropriate indentation.
I've looked at the lexer to better understand the logic around when to expect a newline token (following a comment). My understanding is that there should always be a newline unless the comment appears at the start of a (logical?) line, since the only codepath that consumes but doesn't generate a newline is via
eat_indentation
.Closes #4417.