syntax_old.tex

\chapter{Context Free Syntax}
This chapter details the first part of the static semantics. In order for any context sensitive checks or type inference can be done, the test programs first need to be parsed into an abstract syntax tree. The context sensitive checks and type inference can then be performed upon the tree.
Some difficulties with implementing a grammar into K is that the grammar originally is written in sort descending order in a document. The goal was to build a grammar that can parse actual programs and ensure there was no bugs. To do this, I started with small example programs, wrote out the example abstract syntax tree, and included the sorts necessary to parse them. Then if they didn't parse correctly I could then debug. I then wrote bigger and bigger example programs and included more and more sorts until all the grammar was included.

The Haskell 2010 report is the current official specification of the Haskell language. 

\url{https://www.haskell.org/onlinereport/haskell2010/}

The grammar specified in section 10.5 of the Haskell 2010 report is a specification of the expanded syntax of Haskell.

\url{https://www.haskell.org/onlinereport/haskell2010/haskellch2.html#x7-210002.7}

As specified in section 2.7, the expanded syntax of Haskell specifies Haskell programs when written using semicolons and braces. However, these can be omitted in a real Haskell program. The compiler will then utilize layout rules for certain grammar structures instead. These are specified in section 10.3.

\url{https://www.haskell.org/onlinereport/haskell2010/haskellch10.html#x17-17800010.3}

The parser for this project does not implement these layout rules and instead only can parse the expanded, layout insensitive syntax of Haskell. It would require another script to convert a program written using the layout sensitive syntax into the expanded syntax in order to parse the program.

\url{https://www.haskell.org/onlinereport/haskell2010/haskellch10.html#x17-18000010.5}

Section 10.1 specifies the notation used in the grammar.

\url{https://www.haskell.org/onlinereport/haskell2010/haskellch10.html#x17-17600010.1}

The notation of 10.1 are always in bold in the grammar.
So an example production in the document looks like

\texttt{qvarid -> \textbf{[} modid . \textbf{]} varid}

This means that 
\begin{lstlisting}
modid .
\end{lstlisting}
is optional, and the brackets are not terminals, but the period is a terminal. Any symbol that is not in bold needs to be written in the program in order to parse correctly.

\section{Syntax Explanation}
There are a lot of parts of the grammar that proposed many challenges to implement in K. For instance, a sort definition that includes an option could be just written using a pipe
in the K syntax.

So the example production

\texttt{qvarid -> \textbf{[} modid . \textbf{]} varid}

Is written in the K syntax as split into two options.
\begin{lstlisting}
syntax QVarId ::= VarId | ModId "." VarId [klabel('qVarIdCon)]
\end{lstlisting}
However, an issue arises when you have something written on the right hand side of the production like

\texttt{data \textbf{[} context => \textbf{]} simpletype \textbf{[} = constrs \textbf{]} \textbf{[} deriving \textbf{]}}

for the topdecl sort. If each optional sort were split into two options, then a production that includes $n$ optional sorts would require $2^n$ options. This would create an unnecessarily large syntax in K.

Instead, for each optional sort in the original grammar, I replaced the optional sort with a new sort. For instance, I replaced

\texttt{ \textbf{[} context => \textbf{]} }

in the original grammar with a new sort called OptContext. Then I just specified
\begin{lstlisting}
syntax OptContext ::= Context "=>" | "" [onlyLabel, klabel('emptyContext)]
\end{lstlisting}
and so the right hand side of the production would now look like
\begin{lstlisting}
"data" OptContext SimpleType OptConstrs OptDeriving  [klabel('data)]
\end{lstlisting}
This is acceptable because Haskell is not an order sorted algebra, so introducing new sorts that are not originally in the grammar is okay.

In K, semantic rules are called K rules. The tag
\begin{lstlisting}
[klabel('exampleLabel)]
\end{lstlisting}
means that in the abstract syntax tree created by K, a term can be referred to using that klabel in a K rule.

\section{Implementation of Section 10.2}
The following introduces the syntax for the keywords, constants, special symbols, and variables that comprise the terminals for the remaining context free grammar as presented in section 10.2 of the 2010 Haskell report.

\begin{lstlisting}
// Syntax from haskell 2010 Report
// https://www.haskell.org/onlinereport/haskell2010/haskellch10.html#x17-17500010

module HASKELL-SYNTAX

    syntax Integer ::= Token{([0-9]+)
                | (([0][o]|[0][O])[0-7]+) 
                | (([0][x] | [0][X])[0-9a-fA-F]+)}   [onlyLabel]

    syntax CusFloat ::= Token{([0-9]+[\.][0-9]+([e E][\+\-]?[0-9]+)?)
                                 |([0-9]+[e E][\+\-]?[0-9]+)} [onlyLabel]
    syntax CusChar ::= Token{[\'](~[\'\\\&])[\']} [onlyLabel]
    syntax CusString ::= Token{[\"](~[\"\\]*)[\"]} [onlyLabel]
\end{lstlisting}
I ran into issue where a program with a variable called ‘size’ did not parse. I found out that this is because ‘size’ is a K keyword. So I just specified that a variable could be a variable token, or “size”. This reveals an issue in the version of K I was using, where K keywords that appear in example parsed programs won't become tokens.
\begin{lstlisting}
    syntax VarId ::= Token{[a-z\_][a-z A-Z\_0-9\']*} [onlyLabel] | "size" [onlyLabel]
\end{lstlisting}
\begin{lstlisting}
    syntax ConId ::= Token{[A-Z][a-zA-Z \_0-9\']*} [onlyLabel]
    syntax VarSym ::= Token{
   ([\! \# \$ \% \& \* \+ \/ \> \? \^][\! \# \$ \% \& \* \+ \. \/ \< \= \> \? \@ \\ \^ \| \- \~ \:]*)
  |[\-] | [\.]
  |([\.][\! \# \$ \% \& \* \+ \/ \< \= \> \? \@ \\ \^ \| \- \~ \:][\! \# \$ \% \& \* \+ \. \/ \< \= \> \? \@ \\ \^ \| \- \~ \:]*)
  | ([\-][\! \# \$ \% \& \* \+ \. \/ \< \= \? \@ \\ \^ \| \~ \:][\! \# \$ \% \& \* \+ \. \/ \< \= \> \? \@ \\ \^ \| \~ \:]*)
  | ([\@][\! \# \$ \% \& \* \+ \. \/ \< \= \> \? \@ \\ \^ \| \- \~ \:]+)
  | ([\~][\! \# \$ \% \& \* \+ \. \/ \< \= \> \? \@ \\ \^ \| \- \~ \:]+)
  | ([\\][\! \# \$ \% \& \* \+ \. \/ \< \= \> \? \@ \\ \^ \| \- \~ \:]+)
  | ([\|][\! \# \$ \% \& \* \+ \. \/ \< \= \> \? \@ \\ \^ \| \- \~ \:]+)
  | ([\:][\! \# \$ \% \& \* \+ \. \/ \< \= \> \? \@ \\ \^ \| \- \~][\! \# \$ \% \& \* \+ \. \/ \< \= \> \? \@ \\ \^ \| \- \~ \:]*)
  | ([\<][\! \# \$ \% \& \* \+ \. \/ \< \= \> \? \@ \\ \^ \| \~ \:][\! \# \$ \% \& \* \+ \. \/ \< \= \> \? \@ \\ \^ \| \- \~ \:]*)
  | ([\=][\! \# \$ \% \& \* \+ \. \/ \< \= \? \@ \\ \^ \| \~ \:][\! \# \$ \% \& \* \+ \. \/ \< \= \> \? \@ \\ \^ \| \- \~ \:]*)} [onlyLabel]
    syntax ConSym ::= Token{[\:][\! \# \$ \% \& \* \+ \. \/ \< \= \> \? \@ \\ \^ \| \- \~][\! \# \$ \% \& \* \+ \. \/ \< \= \> \? \@ \\ \^ \| \- \~ \:]*}   [onlyLabel]
\end{lstlisting}
I ran into an issue where floats and integers did not parse correctly. They caused parsing errors due to ambiguity of parsing. For example the number 123.45 had ambiguity where the parser did not know if 1, 12, or 123 where integers, and if 5 was an integer, or if the entire thing was one float. Normally in K, different tokens are separated with whitespaces. However, for some reason the parser had difficulty here. Initially, I added a workaround by requiring parentheses around each integer and floating point.
This fixed the issue.
\begin{lstlisting}

    syntax IntFloat ::= "(" Integer ")"  [bracket] //NOT OFFICIAL SYNTAX
                      | "(" CusFloat ")" [bracket]
\end{lstlisting}
\begin{lstlisting}
    syntax Literal ::= IntFloat | CusChar | CusString
    syntax TyCon ::= ConId
    syntax ModId ::= ConId | ConId "." ModId  [klabel('conModId)]
    syntax QTyCon ::= TyCon | ModId "." TyCon [klabel('conTyCon)]
    syntax QVarId ::= VarId | ModId "." VarId [klabel('qVarIdCon)]
    syntax QVarSym ::= VarSym | ModId "." VarSym  [klabel('qVarSymCon)]
    syntax QConSym ::= ConSym | ModId "." ConSym  [klabel('qConSymCon)]
    syntax TyVars ::= List{TyVar, ""} [klabel('typeVars)] //used in SimpleType syntax
    syntax TyVar ::= VarId
    syntax TyVarTuple ::= TyVar "," TyVar      [klabel('twoTypeVarTuple)]
                        | TyVar "," TyVarTuple [klabel('typeVarTupleCon)]

    syntax Con ::= ConId | "(" ConSym ")"   [klabel('conSymBracket)]
    syntax Var ::= VarId | "(" VarSym ")"   [klabel('varSymBracket)]
    syntax QVar ::= QVarId | "(" QConSym ")" [klabel('qVarBracket)]
    syntax QCon ::= QTyCon | "(" GConSym ")"  [klabel('gConBracket)]

    syntax QConOp ::= GConSym | "`" QTyCon "`"    [klabel('qTyConQuote)]
    syntax QVarOp ::= QVarSym | "`" QVarId "`"    [klabel('qVarIdQuote)]
    syntax VarOp ::= VarSym   | "`" VarId "`"     [klabel('varIdQuote)]
    syntax ConOp ::= ConSym   | "`" ConId "`"     [klabel('conIdQuote)]

    syntax GConSym ::= ":" | QConSym
    syntax Vars ::= Var
                  | Var "," Vars  [klabel('varCon)]
    syntax VarsType ::= Vars "::" Type  [klabel('varAssign)]
    syntax Ops ::= Op
                 | Op "," Ops  [klabel('opCon)]
    syntax Fixity ::= "infixl" | "infixr" | "infix"
    syntax Op ::= VarOp | ConOp
    syntax CQName ::= Var | Con | QVar

    /* syntax QConId ::= ConId | ModId "." ConId */

    syntax QOp ::= QVarOp | QConOp
\end{lstlisting}

\section{Implementation of Section 10.5}
The following introduces the sorts of the context free grammar of the Haskell extended syntax.

\subsection{Modules}
We start with modules.
\begin{lstlisting}
    syntax ModuleName ::= "module" ModId [klabel('moduleName)]

    syntax Module ::= ModuleName "where" Body          [klabel('module)]
                    | ModuleName Exports "where" Body  [klabel('moduleExp)]
                    | Body                             [klabel('moduleBody)]

    syntax Body ::= "{" ImpDecls ";" TopDecls "}" [klabel('bodyimpandtop)]
                  | "{" ImpDecls "}" [klabel('bodyimpdecls)]
                  | "{" TopDecls "}" [klabel('bodytopdecls)]
\end{lstlisting}

The sort that contains all the other sorts is a module. A module represents one complete Haskell program. It can have either a name and a body, a name and a body with exports, or just a body.

\subsection{ImpDecls}
An ImpDecl is an import declaration. An import is another module that this module depends on.
The definition of ImpDecl is the following
\begin{lstlisting}
    syntax ImpDecls ::= List{ImpDecl, ";"} [klabel('impDecls)]
    syntax ImpDecl ::= "import" OptQualified ModId OptAsModId OptImpSpec [klabel('impDecl)]
                     | "" [onlyLabel, klabel('emptyImpDecl)]
    syntax OptQualified ::= "qualified"
                          | ""  [onlyLabel, klabel('emptyQualified)]
    syntax OptAsModId ::= "as" ModId
                        | ""    [onlyLabel, klabel('emptyOptAsModId)]

    syntax OptImpSpec ::= ImpSpec
                        | ""    [onlyLabel, klabel('emptyOptImpSpec)]

    syntax ImpSpecKey ::= "(" ImportList OptComma ")"
    syntax ImpSpec ::= ImpSpecKey
                     | "hiding" ImpSpecKey

    syntax ImportList ::= List{Import, ","}

    syntax Import ::= Var
                    | TyCon CQList
\end{lstlisting}

The following example program has a Module with a name and a body of only ImpDecls. It has one ImpDecl.

\begin{lstlisting}
module Foo where
{import Bar
}
\end{lstlisting}

The following example program has a Module with no name and and a body of only ImpDecls. It has one ImpDecl.

\begin{lstlisting}
import Bar
\end{lstlisting}

Another example program is
\begin{lstlisting}
module Simp1 where
{import Test
}
\end{lstlisting}

The corresponding abstract syntax tree in K is
\begin{lstlisting}
`'module`(`'moduleName`(#token("Simp1","ConId")), `'bodyimpdecls`(`'impDecls`(`'impDecl`(`'emptyQualified`( .::KList), #token("Test","ConId"), `'emptyOptAsModId`(.::KList), `'emptyOptImpSpec`(.::KList)), `'.List{"'impDecls"}`(.::KList))))
\end{lstlisting}

Module contains two children. The first child of module is moduleName which contains the token Simp1. Simp1 is a constructor ID because it starts with a capital letter. The second child of module is bodyimpdecls. This contains the sort impDecls which is a list of impDecls. This program has only one impDecl. Since there is no qualified, the impDecl contains the child emptyQualified, followed by the token Test. Since there is no AsModId or ImpSpec, the last two children are emptyOptAsModId and emptyOptImpSpec.

\subsection{TopDecls}
The main types of expressions in Haskell are TopDecls - Top Declarations. A top declaration can either be a type, a data, a newtype, a class, an instance, a default, a foreign, or an arbitrary declaration.

Any sort that starts with 'Opt' means that this is optional. In K something can be made optional by declaring the necessary constructors or nothing.

\begin{lstlisting}
    syntax TopDecls ::= List{TopDecl, ";"} [klabel('topdeclslist)]

    syntax TopDecl ::= Decl [klabel('topdecldecl)]
                     > "type" SimpleType "=" Type [klabel('type)]
                     | "data" OptContext SimpleType OptConstrs OptDeriving  [klabel('data)]
                     | "newtype" OptContext SimpleType "=" NewConstr OptDeriving [klabel('newtype)]
                     | "class" OptContext ConId TyVar OptCDecls [klabel('class)]
                     | "instance" OptContext QTyCon Inst OptIDecls [klabel('instance)]
                     | "default" Types [klabel('default)]
                     | "foreign" FDecl [klabel('foreign)]
\end{lstlisting}

\subsection{Decls}

A Decl is any general declaration. So something like

\begin{lstlisting}
f x = x + 2
\end{lstlisting}

is a Decl.

The following is the definition of Decl and related sorts.

\begin{lstlisting}
    syntax OptDecls ::= "where" Decls | "" [onlyLabel, klabel('emptyOptDecls)]
    syntax Decls ::= "{" DeclsList "}" [klabel('decls)]
    syntax DeclsList ::= List{Decl, ";"} [klabel('declsList)]

    syntax Decl ::= GenDecl
                  | FunLhs Rhs [klabel('declFunLhsRhs)]
                  | Pat Rhs [klabel('declPatRhs)]
\end{lstlisting}

This section introduces the sorts whose parent is Decl. This contains general declarations, function left hand side and right hand side, function guards, and expressions.

\begin{lstlisting}
    syntax GenDecl ::= VarsType
                     | Vars "::" Context "=>" Type   [klabel('genAssignContext)]
                     | Fixity Ops
                     | Fixity Integer Ops
                     | "" [onlyLabel, klabel('emptyGenDecl)]

    syntax FunLhs ::= Var APatList [klabel('varApatList)]
                    | Pat VarOp Pat [klabel('patVarOpPat)]
                    | "(" FunLhs ")" APatList [klabel('funlhsApatList)]

    syntax Rhs ::= "=" Exp OptDecls [klabel('eqExpOptDecls)]
                 | GdRhs OptDecls [klabel('gdRhsOptDecls)]

    syntax GdRhs ::= Guards "=" Exp
                   | Guards "=" Exp GdRhs
    syntax Guards ::= "|" GuardList
    syntax GuardList ::= Guard | Guard "," GuardList  [klabel('guardListCon)]
    syntax Guard ::= Pat "<-" InfixExp
                   | "let" Decls
                   | InfixExp

    //definition of exp
    syntax Exp ::= InfixExp
                 > InfixExp "::" Type  [klabel('expAssign)]
                 | InfixExp "::" Context "=>" Type  [klabel('expAssignContext)]

    syntax InfixExp ::= LExp
                      > "-" InfixExp   [klabel('minusInfix)]
                      > LExp QOp InfixExp
\end{lstlisting}

\subsection{LExp}
LExp is an important sort for the type inference function. This is because LExp defines the different expression types which the type inference function has specific rules for.

The different LExp types are a lambda expression, a 'let-in' expression, an 'if' statement, a case statement, and a 'do' block.

\begin{lstlisting}
    syntax LExp ::= AExp
                  > "\\" APatList "->" Exp [klabel('lambdaFun)]
                  | "let" Decls "in" Exp [klabel('letIn)]
                  | "if" Exp OptSemicolon "then" Exp OptSemicolon "else" Exp [klabel('ifThenElse)]
                  | "case" Exp "of" "{" Alts "}" [klabel('caseOf)]
                  | "do" "{" Stmts "}" [klabel('doBlock)]

\end{lstlisting}

\subsection{AExp}
AExp is also an import sort for the type inference function. The main parts of AExp that the inference function cares about is QVar and GCon.

QVar is a qualified variable and GCon is a general constructor.
\begin{lstlisting}

    syntax OptSemicolon ::= ";" | "" [onlyLabel, klabel('emptySemicolon)]
    syntax OptComma ::= "," | ""     [onlyLabel, klabel('emptyComma)]

    syntax AExp ::= QVar [klabel('aexpQVar)]
                  | GCon [klabel('aexpGCon)]
                  | Literal [klabel('aexpLiteral)]
                  > AExp AExp [left, klabel('funApp)]
                  > QCon "{" FBindList "}"
                  | AExp "{" FBindList "}" //aexp cannot be qcon UNFINISHED
                          //Liyi: first, does not understand the syntax, it is the Qcon {FBindlist}
                          //or QCon? Second, place a check in preprosssing.
                          //and also check the Fbindlist here must be at least one argument
                  > "(" Exp ")"             [bracket]
                  | "(" ExpTuple ")"
                  | "[" ExpList "]"
                  | "[" Exp OptExpComma ".." OptExp "]"
                  | "[" Exp "|" Quals "]"
                  | "(" InfixExp QOp ")"
                  | "(" QOp InfixExp ")" //qop cannot be - (minus) UNFINISHED
                           //Liyi: place a check here to check if QOp is a minus
                           
\end{lstlisting}

\section{Example Test Programs}