Clarify the workings of StringReader in the lexer #36470

nnethercote · 2016-09-14T05:41:19Z

In a lexer you want it to be extremely clear what the current character is, as well as those before and after it. The current code and comments in StringReader don't achieve this: curr refers to the current character, but so does last_pos; pos refers to the next character; and col refers to the current character but its comment says it refers to the next character.

This PR renames some of these fields and does some related clean-ups. This makes the lexer code easier to understand and marginally faster.

The meaning of the "last" character is non-obvious: it could be (a) the current character or (b) the character prior to that. `last_pos` uses meaning (a), but `curr_pos` makes that clearer. Even better, `curr_pos` matches `curr`. This commit also avoids a few unnecessary local variables.

This makes it clearer that it refers to the next character, and not to the current character.

This makes it clearer that it refers to the current character. (Prior to this commit, even the comment describing `col` got this wrong.)

First, assert! is redundant w.r.t. the unwrap() immediately afterwards. Second, `byte_offset_diff` is effectively computed as `current_byte_offset + ch.len_utf8() - current_byte_offset` (with `next` as an intermediate) which is silly and can be simplified.

This commit renames the variables to make it clearer which char each one refers to. It also slightly reorders and rearranges some statements.

rust-highfive · 2016-09-14T05:41:35Z

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @sfackler (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

The two branches of this `if` compute the same value. This commit gets rid of the first branch, which makes this calculation identical to the one in scan_block_comment().

sanxiyn · 2016-09-19T15:34:52Z

This changes public API of libsyntax, so it needs special handling. cc #31645.

nnethercote · 2016-09-21T03:00:24Z

@sfackler: 7 day review ping!

nnethercote · 2016-09-21T03:24:48Z

So the patches above do these renamings:

pos      -> next_pos
last_pos -> curr_pos
col      -> curr_col

Another possibility is this:

pos      -> next_pos
last_pos -> pos
col unchanged
curr     -> ch

The idea being that we avoid curr_ prefixes, instead making them implicit. The reason I suggest this is that it would match Parser, which has token and span which refer to the current token and span. (Parser also has last_span and last_token_kind which should be renamed as prev_span and prev_token_kind respectively.)

bors · 2016-09-23T03:00:34Z

☔ The latest upstream changes (presumably #36154) made this pull request unmergeable. Please resolve the merge conflicts.

alexcrichton · 2016-10-03T16:05:48Z

r? @eddyb, perhaps you can help take a look here?

eddyb

@Manishearth r=me

nnethercote · 2016-10-03T20:08:54Z

Oh... I pulled out two of these commits in #36921, and I want to redo the renaming in the way I mentioned above, and I want to do some similar renamings in the parser.

So: thank you for the review, @eddyb, but I'd like to hold off from landing these commits in their current form. (Well, they need a rebase, but I'm going to change them in a bigger way.) I will file a new PR for that to make things clearer. Apologies for the lack of clarity.

Two lexer tweaks 19 days later, I haven't received a review of my commits in rust-lang#36470. In an attempt to make some progress, I'm going to split up the changes. Here are the ones that don't relate to renaming things.

nnethercote · 2016-10-04T22:32:24Z

I'm going to reuse this PR for my updated commits.

nnethercote · 2016-10-04T22:41:53Z

Nope, that didn't work because I did the new patches in a different local clone. So I opened #36969.

nnethercote added 5 commits September 14, 2016 11:48

Rename StringReader::pos as next_pos.

1ea0c89

This makes it clearer that it refers to the next character, and not to the current character.

Rename StringReader::col as curr_col.

ee7bc2d

This makes it clearer that it refers to the current character. (Prior to this commit, even the comment describing `col` got this wrong.)

Clarify StringReader::bump.

6535111

This commit renames the variables to make it clearer which char each one refers to. It also slightly reorders and rearranges some statements.

rust-highfive assigned sfackler Sep 14, 2016

Simplify start_bpos calculation in scan_comment().

a1984ae

The two branches of this `if` compute the same value. This commit gets rid of the first branch, which makes this calculation identical to the one in scan_block_comment().

nnethercote mentioned this pull request Oct 3, 2016

Two lexer tweaks #36921

Merged

rust-highfive assigned eddyb and unassigned sfackler Oct 3, 2016

eddyb approved these changes Oct 3, 2016

View reviewed changes

nnethercote closed this Oct 3, 2016

nnethercote deleted the rename-StringReader-fields branch October 3, 2016 20:09

nnethercote restored the rename-StringReader-fields branch October 4, 2016 22:32

nnethercote reopened this Oct 4, 2016

nnethercote closed this Oct 4, 2016

nnethercote deleted the rename-StringReader-fields branch October 4, 2016 22:41

nnethercote mentioned this pull request Oct 4, 2016

Clarify the positions of the lexer and parser #36969

Merged

mcarton mentioned this pull request Oct 27, 2016

Fix bad error message with ::< in types #36206

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarify the workings of StringReader in the lexer #36470

Clarify the workings of StringReader in the lexer #36470

nnethercote commented Sep 14, 2016

rust-highfive commented Sep 14, 2016

sanxiyn commented Sep 19, 2016

nnethercote commented Sep 21, 2016

nnethercote commented Sep 21, 2016

bors commented Sep 23, 2016

alexcrichton commented Oct 3, 2016

eddyb left a comment

nnethercote commented Oct 3, 2016

nnethercote commented Oct 4, 2016

nnethercote commented Oct 4, 2016

Clarify the workings of StringReader in the lexer #36470

Clarify the workings of StringReader in the lexer #36470

Conversation

nnethercote commented Sep 14, 2016

rust-highfive commented Sep 14, 2016

sanxiyn commented Sep 19, 2016

nnethercote commented Sep 21, 2016

nnethercote commented Sep 21, 2016

bors commented Sep 23, 2016

alexcrichton commented Oct 3, 2016

eddyb left a comment

Choose a reason for hiding this comment

nnethercote commented Oct 3, 2016

nnethercote commented Oct 4, 2016

nnethercote commented Oct 4, 2016