Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify the workings of StringReader in the lexer #36470

Closed

Conversation

nnethercote
Copy link
Contributor

In a lexer you want it to be extremely clear what the current character is, as well as those before and after it. The current code and comments in StringReader don't achieve this: curr refers to the current character, but so does last_pos; pos refers to the next character; and col refers to the current character but its comment says it refers to the next character.

This PR renames some of these fields and does some related clean-ups. This makes the lexer code easier to understand and marginally faster.

The meaning of the "last" character is non-obvious: it could be (a) the
current character or (b) the character prior to that. `last_pos` uses
meaning (a), but `curr_pos` makes that clearer. Even better, `curr_pos`
matches `curr`.

This commit also avoids a few unnecessary local variables.
This makes it clearer that it refers to the next character, and not to
the current character.
This makes it clearer that it refers to the current character. (Prior to
this commit, even the comment describing `col` got this wrong.)
First, assert! is redundant w.r.t. the unwrap() immediately afterwards.

Second, `byte_offset_diff` is effectively computed as
`current_byte_offset + ch.len_utf8() - current_byte_offset` (with `next`
as an intermediate) which is silly and can be simplified.
This commit renames the variables to make it clearer which char each one
refers to. It also slightly reorders and rearranges some statements.
@rust-highfive
Copy link
Collaborator

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @sfackler (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

The two branches of this `if` compute the same value. This commit gets
rid of the first branch, which makes this calculation identical to the
one in scan_block_comment().
@sanxiyn
Copy link
Member

sanxiyn commented Sep 19, 2016

This changes public API of libsyntax, so it needs special handling. cc #31645.

@nnethercote
Copy link
Contributor Author

@sfackler: 7 day review ping!

@nnethercote
Copy link
Contributor Author

So the patches above do these renamings:

pos      -> next_pos
last_pos -> curr_pos
col      -> curr_col

Another possibility is this:

pos      -> next_pos
last_pos -> pos
col unchanged
curr     -> ch

The idea being that we avoid curr_ prefixes, instead making them implicit. The reason I suggest this is that it would match Parser, which has token and span which refer to the current token and span. (Parser also has last_span and last_token_kind which should be renamed as prev_span and prev_token_kind respectively.)

@bors
Copy link
Contributor

bors commented Sep 23, 2016

☔ The latest upstream changes (presumably #36154) made this pull request unmergeable. Please resolve the merge conflicts.

@nnethercote nnethercote mentioned this pull request Oct 3, 2016
@alexcrichton
Copy link
Member

r? @eddyb, perhaps you can help take a look here?

@rust-highfive rust-highfive assigned eddyb and unassigned sfackler Oct 3, 2016
Copy link
Member

@eddyb eddyb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nnethercote
Copy link
Contributor Author

Oh... I pulled out two of these commits in #36921, and I want to redo the renaming in the way I mentioned above, and I want to do some similar renamings in the parser.

So: thank you for the review, @eddyb, but I'd like to hold off from landing these commits in their current form. (Well, they need a rebase, but I'm going to change them in a bigger way.) I will file a new PR for that to make things clearer. Apologies for the lack of clarity.

@nnethercote nnethercote closed this Oct 3, 2016
@nnethercote nnethercote deleted the rename-StringReader-fields branch October 3, 2016 20:09
Manishearth added a commit to Manishearth/rust that referenced this pull request Oct 4, 2016
Two lexer tweaks

19 days later, I haven't received a review of my commits in rust-lang#36470. In an attempt to make some progress, I'm going to split up the changes. Here are the ones that don't relate to renaming things.
@nnethercote nnethercote restored the rename-StringReader-fields branch October 4, 2016 22:32
@nnethercote
Copy link
Contributor Author

I'm going to reuse this PR for my updated commits.

@nnethercote nnethercote reopened this Oct 4, 2016
@nnethercote
Copy link
Contributor Author

Nope, that didn't work because I did the new patches in a different local clone. So I opened #36969.

@nnethercote nnethercote closed this Oct 4, 2016
@nnethercote nnethercote deleted the rename-StringReader-fields branch October 4, 2016 22:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants