Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: correct handling of unicode in lexer #3335

Merged
merged 1 commit into from Oct 30, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 9 additions & 19 deletions compiler/noirc_frontend/src/lexer/lexer.rs
Original file line number Diff line number Diff line change
Expand Up @@ -6,17 +6,13 @@ use super::{
};
use acvm::FieldElement;
use noirc_errors::{Position, Span};
use std::str::Chars;
use std::{
iter::{Peekable, Zip},
ops::RangeFrom,
};
use std::str::CharIndices;

/// The job of the lexer is to transform an iterator of characters (`char_iter`)
/// into an iterator of `SpannedToken`. Each `Token` corresponds roughly to 1 word or operator.
/// Tokens are tagged with their location in the source file (a `Span`) for use in error reporting.
pub struct Lexer<'a> {
char_iter: Peekable<Zip<Chars<'a>, RangeFrom<u32>>>,
chars: CharIndices<'a>,
position: Position,
done: bool,
skip_comments: bool,
Expand All @@ -41,13 +37,7 @@ impl<'a> Lexer<'a> {
}

pub fn new(source: &'a str) -> Self {
Lexer {
// We zip with the character index here to ensure the first char has index 0
char_iter: source.chars().zip(0..).peekable(),
position: 0,
done: false,
skip_comments: true,
}
Lexer { chars: source.char_indices(), position: 0, done: false, skip_comments: true }
}

pub fn skip_comments(mut self, flag: bool) -> Self {
Expand All @@ -57,21 +47,21 @@ impl<'a> Lexer<'a> {

/// Iterates the cursor and returns the char at the new cursor position
fn next_char(&mut self) -> Option<char> {
let (c, index) = self.char_iter.next()?;
self.position = index;
Some(c)
let (position, ch) = self.chars.next()?;
self.position = position as u32;
Some(ch)
}

/// Peeks at the next char. Does not iterate the cursor
fn peek_char(&mut self) -> Option<char> {
self.char_iter.peek().map(|(c, _)| *c)
self.chars.clone().next().map(|(_, ch)| ch)
}

/// Peeks at the character two positions ahead. Does not iterate the cursor
fn peek2_char(&mut self) -> Option<char> {
let mut chars = self.char_iter.clone();
let mut chars = self.chars.clone();
chars.next();
chars.next().map(|(c, _)| c)
chars.next().map(|(_, ch)| ch)
}

/// Peeks at the next char and returns true if it is equal to the char argument
Expand Down