Skip to content
This repository has been archived by the owner on Jun 4, 2024. It is now read-only.
Daniele Sassoli edited this page Aug 12, 2016 · 15 revisions

#Roku Linter Wiki

##How is a programming language composed? A Programming language is made up Static Aspects and Dynamic Aspects, in the scope of this project we will take care only of Static Aspects.

Static aspects basically are:

  • Syntax: the structure of a language, defines the valid constructs of a given language.
  • Scope: scope of the variables, which differ for different languages.
  • Type Checking: checking the types in the assignments and similar.

A Language is made of

  • terminal symbols: elementary symbols of the language.
  • non-terminal symbols: symbols that can be replaced (such as variables).
  • production rules: the rules that define the language grammar

##What will we build

We'll start building a lexer that will enables us to check that a code is lexically correct, matching a given stream of characters against a set of grammar rules. The characters will be divided into tokens using regular expressions.

Once we've got a list of valid tokens we will parse them, checking for syntactical errors.

##Parsing techniques taken into consideration

There are various types of parsers:

  • LR - Left Right
  • GLR - Generalized Left Right
  • LALR - Look Ahead Left Right

And we can approach the tokens list in different ways, such as:

TOP-DOWN: starts constructing the parse tree from the start symbol and then tries to transform the start symbol to the input

  • Recursive descent parsing : It uses recursive procedures to process the input, this leads to backtracking problems. If one derivation of a production fails, the syntax analyzer restarts the process using different rules. This technique may process the input string more than once.

  • Predictive Parser : To avoid backtracking problems we use a stack and a parsing table to parse the input and generate a parse tree.

BOTTOM-UP: parsing starts with the input symbols and tries to construct the parse tree up to the start symbol. It's usually more efficient but harder to implement

##Code errors management

A program can have various kinds of errors at various stages:

  • Lexical : name of some identifier typed incorrectly
  • Syntactical : missing semicolon or unbalanced parenthesis
  • Semantical : incompatible value assignment
  • Logical : code not reachable, infinite loop

We'll have to be able to recover from this sort of errors and continue with parsing the code. The main error-recovery strategies we've taken into considerations are

Techniques Description Advantages Disadvantages
Panic Mode If we encounter a statement with an error, we skip the statement Cool Name, easier to implement
Statement Mode If we encounter a statement with an error we try to guess an appropriate correction We mustn't skip any statements and all the code gets checked every time If we guess the wrong think we can generate infinite loops
Clone this wiki locally