Lexx

Lexx is a fast, extensible, greedy, single-pass text tokenizer.

Sample output for the string "This is \n1.0 thing."

use lexx::token::Token;
Token{ token_type: 4, value: "This".to_string(), line: 1, column: 1, len: 4, precedence: 0};
Token{ token_type: 3, value: " ".to_string(), line: 1, column: 5, len: 1, precedence: 0};
Token{ token_type: 4, value: "is".to_string(), line: 1, column: 6, len: 2, precedence: 0};
Token{ token_type: 3, value: "  \n".to_string(), line: 1, column: 8, len: 3, precedence: 0};
Token{ token_type: 2, value: "1.0".to_string(), line: 2, column: 1, len: 3, precedence: 0};
Token{ token_type: 3, value: " ".to_string(), line: 2, column: 4, len: 1, precedence: 0};
Token{ token_type: 4, value: "thing".to_string(), line: 2, column: 5, len: 5, precedence: 0};
Token{ token_type: 5, value: ".".to_string(), line: 2, column: 10, len: 1, precedence: 0};

Structure

Lexx consist of 4 different componants:

LexxInuput provides a stream of char characters
Matchers which are used to identify parts of a string, such as Integers or Symbols
Token which is the results of a successful match
Lexx itself

Functionality

Lexx uses a LexxInput to provide chars that are fed to Matcher instances until the longest match is found, if any. The match will be returned as a Token instance. The Token includes a type and the string matched as well as the line and column where the match was made. A custom LexxInput can be passed to Lexx but the library comes with implementations for String and Reader types.

Lexx implements [Iterator] so it can be use with for each.

Custom Matchers can also be made though Lexx comes with:

WordMatcher matches alphabetic characters such as ABCdef and word
IntegerMatcher matches integers such as 3 or 14537
FloatMatcher matches floats such as 434.312 or 0.001
ExactMatcher given a vector of strings matches exactly those strings. You can initialize it with a Type to return so you can use multiple ones for different things. For example one ExactMatcher can be used to find operators such as == and + while another could be used to find block identifiers such as ( and ).
SymbolMatcher matches all non alphanumerics *&)_#@ or .. This is a good catch-all matcher.
KeywordMatcher matches specific passed in words such as new or specific, it differs from the ExactMatcher in that it will not mach substrings, such as the new in renewable or newfangled.
WhitespaceMatcher matches whitespace such as or \t\r\n

Matchers can be given a precedence that can make a matcher return it's results even if another matcher has a longer match. For example, both the WordMatcher and KeywordMatcher are used at the same time.

Note that matchers cannot find matches that start inside the valid matches of other matchers. For matching renewable, the WordMatcher will make the match even if the ExactMatcher is looking for new with a higher precedence because the WordMatcher will consume all of renewable without giving other matchers the chance to look inside of it.

Also while the ExactMatcher could find the new inside newfangled the WordMatcher would match newfangled instead since it is longer, unless the ExactMatcher is given a higher precedence in which case it would get to return new and the next match would start at fangled.

To successfully parse an entire stream [Lexx] must have a matcher with which to tokenize every encountered collection of characters. If a match fails [Lexx] will return Err TokenNotFound with the text that could not be matched.

Panics

For speed Lexx does not dynamically allocate buffer space, in Lexx<CAP> CAP is the maximum possible token size, if that size is exceeded a panic will be thrown.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
Varney-the-Vampire.txt		Varney-the-Vampire.txt
lexx.iml		lexx.iml
rustfmt.toml		rustfmt.toml
small_file.txt		small_file.txt
utf-8-sampler.txt		utf-8-sampler.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lexx

Structure

Functionality

Panics

About

Releases

Packages

Languages

License

JeffThomas/lexx

Folders and files

Latest commit

History

Repository files navigation

Lexx

Structure

Functionality

Panics

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages