Home

#Roku Linter Wiki

##How is a programming language composed? A Programming language is made up Static Aspects and Dynamic Aspects, in the scope of this project we will take care only of Static Aspects.

Static aspects basically are:

Syntax: the structure of a language, defines the valid constructs of a given language.
Scope: scope of the variables, which differ for different languages.
Type Checking: checking the types in the assignments and similar.

A Language is made of

terminal symbols: elementary symbols of the language.
non-terminal symbols: symbols that can be replaced (such as variables).
production rules: the rules that define the language grammar

##What will we build

We'll start building a lexer that will enables us to check that a code is lexically correct, matching a given stream of characters against a set of grammar rules. The characters will be divided into tokens using regular expressions.

Once we've got a list of valid tokens we will parse them, checking for syntactical errors.

##Parsing techniques taken into consideration

There are various types of parsers:

LR - Left Right
GLR - Generalized Left Right
LALR - Look Ahead Left Right

And we can approach the tokens list in different ways, such as:

TOP-DOWN: starts constructing the parse tree from the start symbol and then tries to transform the start symbol to the input

Recursive descent parsing : It uses recursive procedures to process the input, this leads to backtracking problems. If one derivation of a production fails, the syntax analyzer restarts the process using different rules. This technique may process the input string more than once.
Predictive Parser : To avoid backtracking problems we use a stack and a parsing table to parse the input and generate a parse tree.

BOTTOM-UP: parsing starts with the input symbols and tries to construct the parse tree up to the start symbol. It's usually more efficient but harder to implement

##Code errors management

A program can have various kinds of errors at various stages:

Lexical : name of some identifier typed incorrectly
Syntactical : missing semicolon or unbalanced parenthesis
Semantical : incompatible value assignment
Logical : code not reachable, infinite loop

We'll have to be able to recover from this sort of errors and continue with parsing the code. The main error-recovery strategies we've taken into considerations are

Techniques	Description	Advantages	Disadvantages
Panic Mode	If we encounter a statement with an error, we skip the statement	Cool Name, easier to implement
Statement Mode	If we encounter a statement with an error we try to guess an appropriate correction	We mustn't skip any statements and all the code gets checked every time	If we guess the wrong think we can generate infinite loops

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

Clone this wiki locally