From 20527cd576233ce800ea362114d5a108eca71684 Mon Sep 17 00:00:00 2001 From: Tbkhi Date: Sun, 10 Mar 2024 17:45:12 -0300 Subject: [PATCH 1/3] Update the-parser.md --- src/the-parser.md | 84 +++++++++++++++++++++++++---------------------- 1 file changed, 45 insertions(+), 39 deletions(-) diff --git a/src/the-parser.md b/src/the-parser.md index 0c68a82c4..12229f4bc 100644 --- a/src/the-parser.md +++ b/src/the-parser.md @@ -1,68 +1,74 @@ # Lexing and Parsing -The very first thing the compiler does is take the program (in Unicode -characters) and turn it into something the compiler can work with more -conveniently than strings. This happens in two stages: Lexing and Parsing. +The very first thing the compiler does is take the program (in Unicode) and +transmute it into a data format the compiler can work with more conveniently +than strings. This happens in two stages: Lexing and Parsing. -Lexing takes strings and turns them into streams of [tokens]. For example, -`foo.bar + buz` would be turned into the tokens `foo`, `.`, -`bar`, `+`, and `buz`. The lexer lives in [`rustc_lexer`][lexer]. + 1. _Lexing_ takes strings and turns them into streams of [tokens]. For + example, `foo.bar + buz` would be turned into the tokens `foo`, `.`, `bar`, + `+`, and `buz`. [tokens]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/token/index.html [lexer]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_lexer/index.html -Parsing then takes streams of tokens and turns them into a structured -form which is easier for the compiler to work with, usually called an [*Abstract -Syntax Tree*][ast] (AST). An AST mirrors the structure of a Rust program in memory, -using a `Span` to link a particular AST node back to its source text. + 2. _Parsing_ takes streams of tokens and turns them into a structured form + which is easier for the compiler to work with, usually called an [*Abstract + Syntax Tree* (`AST`)][ast] . + + +An `AST` mirrors the structure of a Rust program in memory, using a `Span` to +link a particular `AST` node back to its source text. The `AST` is defined in +[`rustc_ast`][rustc_ast], along with some definitions for tokens and token +streams, data structures/`trait`s for mutating `AST`s, and shared definitions for +other `AST`-related parts of the compiler (like the lexer and +`macro`-expansion). -The AST is defined in [`rustc_ast`][rustc_ast], along with some definitions for -tokens and token streams, data structures/traits for mutating ASTs, and shared -definitions for other AST-related parts of the compiler (like the lexer and -macro-expansion). +The lexer is developed in [`rustc_lexer`][lexer]. The parser is defined in [`rustc_parse`][rustc_parse], along with a high-level interface to the lexer and some validation routines that run after -macro expansion. In particular, the [`rustc_parse::parser`][parser] contains +`macro` expansion. In particular, the [`rustc_parse::parser`][parser] contains the parser implementation. -The main entrypoint to the parser is via the various `parse_*` functions and others in the -[parser crate][parser_lib]. They let you do things like turn a [`SourceFile`][sourcefile] +The main entrypoint to the parser is via the various `parse_*` functions and others in +[rustc_parse][rustc_parse]. They let you do things like turn a [`SourceFile`][sourcefile] (e.g. the source in a single file) into a token stream, create a parser from -the token stream, and then execute the parser to get a `Crate` (the root AST +the token stream, and then execute the parser to get a [`Crate`] (the root `AST` node). -To minimize the amount of copying that is done, -both [`StringReader`] and [`Parser`] have lifetimes which bind them to the parent `ParseSess`. -This contains all the information needed while parsing, -as well as the [`SourceMap`] itself. +To minimize the amount of copying that is done, both [`StringReader`] and +[`Parser`] have lifetimes which bind them to the parent [`ParseSess`]. This +contains all the information needed while parsing, as well as the [`SourceMap`] +itself. -Note that while parsing, we may encounter macro definitions or invocations. We -set these aside to be expanded (see [this chapter](./macro-expansion.md)). -Expansion may itself require parsing the output of the macro, which may reveal -more macros to be expanded, and so on. +Note that while parsing, we may encounter `macro` definitions or invocations. We +set these aside to be expanded (see [Macro Expansion](./macro-expansion.md)). +Expansion itself may require parsing the output of a `macro`, which may reveal +more `macro`s to be expanded, and so on. ## More on Lexical Analysis Code for lexical analysis is split between two crates: -- `rustc_lexer` crate is responsible for breaking a `&str` into chunks +- [`rustc_lexer`] crate is responsible for breaking a `&str` into chunks constituting tokens. Although it is popular to implement lexers as generated - finite state machines, the lexer in `rustc_lexer` is hand-written. + finite state machines, the lexer in [`rustc_lexer`] is hand-written. -- [`StringReader`] integrates `rustc_lexer` with data structures specific to `rustc`. - Specifically, - it adds `Span` information to tokens returned by `rustc_lexer` and interns identifiers. +- [`StringReader`] integrates [`rustc_lexer`] with data structures specific to + `rustc`. Specifically, it adds `Span` information to tokens returned by + [`rustc_lexer`] and interns identifiers. -[rustc_ast]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/index.html -[rustc_errors]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_errors/index.html -[ast]: https://en.wikipedia.org/wiki/Abstract_syntax_tree +[`Crate`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/ast/struct.Crate.html +[`Parser`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/parser/struct.Parser.html +[`ParseSess`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_session/parse/struct.ParseSess.html +[`rustc_lexer`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_lexer/index.html [`SourceMap`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/source_map/struct.SourceMap.html +[`StringReader`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/lexer/struct.StringReader.html [ast module]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/ast/index.html -[rustc_parse]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/index.html -[parser_lib]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/index.html +[ast]: https://en.wikipedia.org/wiki/Abstract_syntax_tree [parser]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/parser/index.html -[`Parser`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/parser/struct.Parser.html -[`StringReader`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/lexer/struct.StringReader.html -[visit module]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/visit/index.html +[rustc_ast]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/index.html +[rustc_errors]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_errors/index.html +[rustc_parse]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/index.html [sourcefile]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/struct.SourceFile.html +[visit module]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/visit/index.html \ No newline at end of file From 3b94bf40cd00384ac6bb53b5fcc81ff0840f2504 Mon Sep 17 00:00:00 2001 From: Tbkhi Date: Tue, 12 Mar 2024 15:49:55 -0300 Subject: [PATCH 2/3] Update the-parser.md --- src/the-parser.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/the-parser.md b/src/the-parser.md index 12229f4bc..ad72d4075 100644 --- a/src/the-parser.md +++ b/src/the-parser.md @@ -65,10 +65,10 @@ Code for lexical analysis is split between two crates: [`SourceMap`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/source_map/struct.SourceMap.html [`StringReader`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/lexer/struct.StringReader.html [ast module]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/ast/index.html -[ast]: https://en.wikipedia.org/wiki/Abstract_syntax_tree +[ast]: ./ast-validation.md [parser]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/parser/index.html [rustc_ast]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/index.html [rustc_errors]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_errors/index.html [rustc_parse]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/index.html [sourcefile]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_span/struct.SourceFile.html -[visit module]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/visit/index.html \ No newline at end of file +[visit module]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/visit/index.html From e9db7804447b8238b172156a7adacce8ced231ec Mon Sep 17 00:00:00 2001 From: Noratrieb <48135649+Noratrieb@users.noreply.github.com> Date: Tue, 24 Sep 2024 20:19:28 +0200 Subject: [PATCH 3/3] minor edits --- src/the-parser.md | 36 +++++++++++++++++------------------- 1 file changed, 17 insertions(+), 19 deletions(-) diff --git a/src/the-parser.md b/src/the-parser.md index ad72d4075..ad66bdbab 100644 --- a/src/the-parser.md +++ b/src/the-parser.md @@ -1,8 +1,8 @@ # Lexing and Parsing -The very first thing the compiler does is take the program (in Unicode) and -transmute it into a data format the compiler can work with more conveniently -than strings. This happens in two stages: Lexing and Parsing. +The very first thing the compiler does is take the program (in UTF-8 Unicode text) +and turn it into a data format the compiler can work with more conveniently than strings. +This happens in two stages: Lexing and Parsing. 1. _Lexing_ takes strings and turns them into streams of [tokens]. For example, `foo.bar + buz` would be turned into the tokens `foo`, `.`, `bar`, @@ -13,38 +13,36 @@ than strings. This happens in two stages: Lexing and Parsing. 2. _Parsing_ takes streams of tokens and turns them into a structured form which is easier for the compiler to work with, usually called an [*Abstract - Syntax Tree* (`AST`)][ast] . + Syntax Tree* (AST)][ast] . -An `AST` mirrors the structure of a Rust program in memory, using a `Span` to -link a particular `AST` node back to its source text. The `AST` is defined in +An AST mirrors the structure of a Rust program in memory, using a `Span` to +link a particular AST node back to its source text. The AST is defined in [`rustc_ast`][rustc_ast], along with some definitions for tokens and token -streams, data structures/`trait`s for mutating `AST`s, and shared definitions for -other `AST`-related parts of the compiler (like the lexer and -`macro`-expansion). +streams, data structures/traits for mutating ASTs, and shared definitions for +other AST-related parts of the compiler (like the lexer and +macro-expansion). The lexer is developed in [`rustc_lexer`][lexer]. The parser is defined in [`rustc_parse`][rustc_parse], along with a high-level interface to the lexer and some validation routines that run after -`macro` expansion. In particular, the [`rustc_parse::parser`][parser] contains +macro expansion. In particular, the [`rustc_parse::parser`][parser] contains the parser implementation. The main entrypoint to the parser is via the various `parse_*` functions and others in [rustc_parse][rustc_parse]. They let you do things like turn a [`SourceFile`][sourcefile] (e.g. the source in a single file) into a token stream, create a parser from -the token stream, and then execute the parser to get a [`Crate`] (the root `AST` +the token stream, and then execute the parser to get a [`Crate`] (the root AST node). -To minimize the amount of copying that is done, both [`StringReader`] and -[`Parser`] have lifetimes which bind them to the parent [`ParseSess`]. This -contains all the information needed while parsing, as well as the [`SourceMap`] -itself. +To minimize the amount of copying that is done, +both [`StringReader`] and [`Parser`] have lifetimes which bind them to the parent [`ParseSess`]. +This contains all the information needed while parsing, as well as the [`SourceMap`] itself. -Note that while parsing, we may encounter `macro` definitions or invocations. We -set these aside to be expanded (see [Macro Expansion](./macro-expansion.md)). -Expansion itself may require parsing the output of a `macro`, which may reveal -more `macro`s to be expanded, and so on. +Note that while parsing, we may encounter macro definitions or invocations. +We set these aside to be expanded (see [Macro Expansion](./macro-expansion.md)). +Expansion itself may require parsing the output of a macro, which may reveal more macros to be expanded, and so on. ## More on Lexical Analysis