Make final newline mandatory #1779

hryx · 2018-11-24T12:03:47Z

While working on doc gen (#21) I discovered that there is special code in the tokenizer state machine to "wrap up" the current state when EOF is encountered (stage1 and std/zig/tokenizer).

The logic here to terminate certain tokens or raise errors is duplicated from the main loop.
This final, extra state handler is a place where discrepancies or accidentally omitted cases could be introduced when updating tokenizer rules.
When I see a diff with no trailing newline, I cry. Just a drop, but all those tears add up.

🚫 ↩️ 😢

Inspired by #663 and feeling brassy, I thought I'd propose one further source file encoding requirement: The final character must be LF (0x0A).

Benefits:

Simpler tokenizing logic: upon EOF, no need to handle unterminated tokens.
We get to remove code: stage1 and std/zig/tokenizer
Increases source code uniformity and minor accidental diff noise.

Downsides:

Adds one more rule to source file validation (not sure of the status in stage1; according to Zig source encoding #663 self-hosted is compliant).
Adds a new restriction by which the programmer/editor must abide.
Questions from newcomers and those who have not configured their editors.
Unconventional.

Neutral:

This restriction on the programmer/editor would be the same level of severity as The Hard Tabs Issue #544, and just as easy for them to acquiesce, so we have a precedent.

The text was updated successfully, but these errors were encountered:

andrewrk · 2018-11-24T17:25:45Z

Note that zig fmt enforces this, if the user chooses to run it.

thejoshwolfe · 2018-11-24T18:03:10Z

An empty source file doesn't end with a newline, and that should be ok.

The final newline restriction is also in place for C/C++ (until C++11), so we're not doing anything too unprecedented.

I think the best argument in favor of this restriction is that it makes tokenizers easier to implement. I'm in favor of this.

andrewrk · 2019-10-17T05:25:16Z

I'm making the call here, that this is going to work the same as hard tabs and CRLFs, which is, it is accepted by the stage2 parser. However, zig fmt fixes all whitespace issues, including this one.

Simpler tokenizing logic: upon EOF, no need to handle unterminated tokens.

We get to remove code: stage1 and std/zig/tokenizer

We could remove code from stage1, but since zig fmt has to be able to fix this, the code would have to stay in the self hosted tokenizer.

Increases source code uniformity and minor accidental diff noise.

Here we have the separate issue of how much to enforce zig fmt. That's something that is reasonable to discuss, but I'm confident that "handle final newline the same as hard tabs & carriage returns" is the right approach here.

andrewrk added the proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. label Nov 24, 2018

andrewrk added this to the 0.5.0 milestone Nov 24, 2018

andrewrk modified the milestones: 0.5.0, 0.6.0 Aug 16, 2019

andrewrk closed this as completed Oct 17, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make final newline mandatory #1779

Make final newline mandatory #1779

hryx commented Nov 24, 2018

andrewrk commented Nov 24, 2018

thejoshwolfe commented Nov 24, 2018

andrewrk commented Oct 17, 2019

Make final newline mandatory #1779

Make final newline mandatory #1779

Comments

hryx commented Nov 24, 2018

andrewrk commented Nov 24, 2018

thejoshwolfe commented Nov 24, 2018

andrewrk commented Oct 17, 2019