Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: lalrpop lexer prototype #4656

Merged
merged 48 commits into from
Apr 15, 2024
Merged
Show file tree
Hide file tree
Changes from 47 commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
3c657c0
wip implementing lalrpop lexer, got lalrpop module compiling/loading,…
michaeljklein Mar 25, 2024
b80bf9e
add test for field element that's too large (current lexer silently t…
michaeljklein Mar 25, 2024
266a27d
add symbol lexing
michaeljklein Mar 26, 2024
fda3d4a
wip string lexing, added int type lexing
michaeljklein Mar 27, 2024
3ccf922
cleanup previous version for using existing lexer
michaeljklein Mar 27, 2024
a897d2b
remove unused test for field element maximum
michaeljklein Mar 28, 2024
a43842e
wip connecting lalrpop to lexer, added conversion to/from lalrpop-fri…
michaeljklein Mar 28, 2024
fa6739e
wip whitespace handling, add back symbol lexing to lalrpop shim
michaeljklein Mar 28, 2024
c0c1e65
wip: add variant of Token with lifetime parameter with conversion fro…
michaeljklein Mar 29, 2024
36c9b90
getting lalrpop to accept Token with lifetime, adding back error reco…
michaeljklein Mar 29, 2024
19609af
draft use statement parsing and ident parsing with spans, cleanup
michaeljklein Apr 2, 2024
f273cbf
add tests for use statements, handle use statement prefix/suffix, upd…
michaeljklein Apr 2, 2024
4a6fc0c
Merge branch 'master' into michaeljklein/lalrpop-lexer
michaeljklein Apr 2, 2024
5190ead
replace is_whitespace with is_code_whitespace (prevent vertical tabs,…
michaeljklein Apr 2, 2024
2c66d53
cargo fmt/clippy, temporarily disable use statement tests, wip handli…
michaeljklein Apr 3, 2024
47691ec
revert lexer iterator to spanned tokens, make from_spanned_token_resu…
michaeljklein Apr 3, 2024
a26eaed
Merge branch 'master' into michaeljklein/lalrpop-lexer
michaeljklein Apr 3, 2024
028def4
add back build.rs for lalrpop, note wrong unused-import warning, test…
michaeljklein Apr 3, 2024
35754b8
very wip: debugging differences between existing and lalrpop parser, …
michaeljklein Apr 4, 2024
5904318
include fuzzer output dir, but ignore contents
michaeljklein Apr 4, 2024
0389d23
Merge branch 'master' into michaeljklein/lalrpop-lexer
michaeljklein Apr 4, 2024
0c0f05a
remove fuzzer and associated tests
michaeljklein Apr 5, 2024
7b20ac2
Merge branch 'master' into michaeljklein/lalrpop-lexer
michaeljklein Apr 5, 2024
c194c4c
cleanup parser test and debugging code, cargo fmt/clippy
michaeljklein Apr 5, 2024
46c70d7
Merge branch 'master' into michaeljklein/lalrpop-lexer
michaeljklein Apr 5, 2024
2d957b6
move undetected-use import to an inline mod::_, restore parse_block t…
michaeljklein Apr 5, 2024
937729e
allow CC0-1.0 (public domain license)
michaeljklein Apr 5, 2024
a959d3a
cleanup grammar
michaeljklein Apr 5, 2024
9ae21d0
Merge branch 'master' into michaeljklein/lalrpop-lexer
michaeljklein Apr 5, 2024
a489d45
move from_spanned_token_result to struct definition site, cleanup lex…
michaeljklein Apr 9, 2024
903362d
cleanup duplicate lalrpop parser file and comments
michaeljklein Apr 9, 2024
b52a522
Merge branch 'master' into michaeljklein/lalrpop-lexer
michaeljklein Apr 9, 2024
0281479
add patch for clippy warning in lalrpop generated file
michaeljklein Apr 9, 2024
d6453b6
revert fuzzer .gitignore changes
michaeljklein Apr 9, 2024
3f57f5e
fix typo
michaeljklein Apr 9, 2024
82fafd8
remove duplicated test
michaeljklein Apr 9, 2024
c13587b
Update compiler/noirc_frontend/src/lexer/token.rs
michaeljklein Apr 10, 2024
5cac533
Update compiler/noirc_frontend/src/lexer/lexer.rs
michaeljklein Apr 10, 2024
b941a6d
Update compiler/noirc_frontend/src/parser/parser.rs
michaeljklein Apr 10, 2024
9acd8f0
Update compiler/noirc_frontend/src/parser/parser.rs
michaeljklein Apr 10, 2024
3f72580
explain unreachable lalrpop parser test, add issue to continue use st…
michaeljklein Apr 11, 2024
2194ee6
Merge branch 'master' into michaeljklein/lalrpop-lexer
michaeljklein Apr 11, 2024
eef774e
fix into_iter()...chain(...iter_mut()..) error
michaeljklein Apr 11, 2024
6b079ae
rename Tok -> BorrowedToken
michaeljklein Apr 11, 2024
d3b7fd6
Merge branch 'master' into michaeljklein/lalrpop-lexer
michaeljklein Apr 11, 2024
efaadce
Merge branch 'master' into michaeljklein/lalrpop-lexer
michaeljklein Apr 12, 2024
e76b02f
Merge branch 'master' into michaeljklein/lalrpop-lexer
michaeljklein Apr 12, 2024
b9d67a8
chore: cargo deny
TomAFrench Apr 15, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
122 changes: 114 additions & 8 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

7 changes: 7 additions & 0 deletions compiler/noirc_frontend/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -24,9 +24,16 @@ small-ord-set = "0.1.3"
regex = "1.9.1"
tracing.workspace = true
petgraph = "0.6"
lalrpop-util = { version = "0.20.2", features = ["lexer"] }

[dev-dependencies]
base64.workspace = true
strum = "0.24"
strum_macros = "0.24"
tempfile.workspace = true

[build-dependencies]
lalrpop = "0.20.2"

[features]
experimental_parser = []
28 changes: 28 additions & 0 deletions compiler/noirc_frontend/build.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
use std::fs::{read_to_string, File};
use std::io::Write;

fn main() {
lalrpop::Configuration::new()
.emit_rerun_directives(true)
.use_cargo_dir_conventions()
.process()
.unwrap();

// here, we get a lint error from "extern crate core" so patching that until lalrpop does
// (adding cfg directives appears to be unsupported by lalrpop)
let out_dir = std::env::var("OUT_DIR").unwrap();
let parser_path = std::path::Path::new(&out_dir).join("noir_parser.rs");
let content_str = read_to_string(parser_path.clone()).unwrap();
let mut parser_file = File::create(parser_path).unwrap();
for line in content_str.lines() {
if line.contains("extern crate core") {
parser_file
.write_all(
format!("{}\n", line.replace("extern crate core", "use core")).as_bytes(),
)
.unwrap();
} else {
parser_file.write_all(format!("{}\n", line).as_bytes()).unwrap();
}
}
}
30 changes: 27 additions & 3 deletions compiler/noirc_frontend/src/lexer/lexer.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,9 @@ use crate::token::{Attribute, DocStyle};

use super::{
errors::LexerErrorKind,
token::{IntType, Keyword, SpannedToken, Token, Tokens},
token::{
token_to_borrowed_token, BorrowedToken, IntType, Keyword, SpannedToken, Token, Tokens,
},
};
use acvm::FieldElement;
use noirc_errors::{Position, Span};
Expand All @@ -21,6 +23,21 @@ pub struct Lexer<'a> {

pub type SpannedTokenResult = Result<SpannedToken, LexerErrorKind>;

pub(crate) fn from_spanned_token_result(
token_result: &SpannedTokenResult,
) -> Result<(usize, BorrowedToken<'_>, usize), LexerErrorKind> {
token_result
.as_ref()
.map(|spanned_token| {
(
spanned_token.to_span().start() as usize,
token_to_borrowed_token(spanned_token.into()),
spanned_token.to_span().end() as usize,
)
})
.map_err(Clone::clone)
}

impl<'a> Lexer<'a> {
/// Given a source file of noir code, return all the tokens in the file
/// in order, along with any lexing errors that occurred.
Expand Down Expand Up @@ -94,7 +111,7 @@ impl<'a> Lexer<'a> {

fn next_token(&mut self) -> SpannedTokenResult {
match self.next_char() {
Some(x) if x.is_whitespace() => {
Some(x) if Self::is_code_whitespace(x) => {
let spanned = self.eat_whitespace(x);
if self.skip_whitespaces {
self.next_token()
Expand Down Expand Up @@ -560,16 +577,21 @@ impl<'a> Lexer<'a> {
}
}

fn is_code_whitespace(c: char) -> bool {
c == '\t' || c == '\n' || c == '\r' || c == ' '
}

/// Skips white space. They are not significant in the source language
fn eat_whitespace(&mut self, initial_char: char) -> SpannedToken {
let start = self.position;
let whitespace = self.eat_while(initial_char.into(), |ch| ch.is_whitespace());
let whitespace = self.eat_while(initial_char.into(), Self::is_code_whitespace);
SpannedToken::new(Token::Whitespace(whitespace), Span::inclusive(start, self.position))
}
}

impl<'a> Iterator for Lexer<'a> {
type Item = SpannedTokenResult;

fn next(&mut self) -> Option<Self::Item> {
if self.done {
None
Expand All @@ -578,10 +600,12 @@ impl<'a> Iterator for Lexer<'a> {
}
}
}

#[cfg(test)]
mod tests {
use super::*;
use crate::token::{FunctionAttribute, SecondaryAttribute, TestScope};

#[test]
fn test_single_double_char() {
let input = "! != + ( ) { } [ ] | , ; : :: < <= > >= & - -> . .. % / * = == << >>";
Expand Down
Loading
Loading