-
Notifications
You must be signed in to change notification settings - Fork 8
Home
#Roku Linter Wiki
##How is a programming language composed? A Programming language is made up Static Aspects and Dynamic Aspects, however, in the scope of this project we will take care only of Static Aspects.
Static aspects basically are:
- Syntax: the structure of a language, defines the valid constructs of a given language.
- Scope: scope of the variables, which differ for different languages.
- Type Checking: checking the types in the assignments and similar.
A Language is made of
- terminal symbols: elementary symbols of the language.
- non-terminal symbols: symbols that can be replaced (such as variables).
- production rules: the rules that define the language grammar
##What will we build
We'll start building a lexer that will enables us to check that some code is lexically correct, matching a given stream of characters against a set of grammar rules. The characters will be divided into tokens using regular expressions.
Once we've got a list of valid tokens we will parse them, checking for syntactical errors.
##Parsing techniques taken into consideration
There are various types of parsers:
- LR - Left Right
- GLR - Generalized Left Right
- LALR - Look Ahead Left Right
And we can approach the tokens list in different ways, such as:
TOP-DOWN: starts constructing the parse tree from the start symbol and then tries to transform the start symbol to the input
-
Recursive descent parsing : It uses recursive procedures to process the input, this leads to backtracking problems. If one derivation of a production fails, the syntax analyzer restarts the process using different rules. This technique may process the input string more than once.
-
Predictive Parser : To avoid backtracking problems we use a stack and a parsing table to parse the input and generate a parse tree.
BOTTOM-UP: we construct the parse tree starting from the end, up to the start symbol. It's usually more efficient but harder to implement
##Code errors management
A program can have various kinds of errors at various stages:
- Lexical : name of some identifier typed incorrectly
- Syntactical : missing semicolon or unbalanced parenthesis
- Semantical : incompatible value assignment
- Logical : code not reachable, infinite loop
We'll have to be able to recover from this sort of errors and continue with parsing the code. The main error-recovery strategies we've taken into considerations are
Techniques | Description | Advantages | Disadvantages |
---|---|---|---|
Panic Mode | If we encounter a statement with an error, we skip the statement | Cool Name, easier to implement | |
Statement Mode | If we encounter a statement with an error, we try to guess an appropriate correction | We mustn't skip any statements and all the code gets checked every time | If we guess the wrong thing we can generate infinite loops |