-
Notifications
You must be signed in to change notification settings - Fork 0
/
syntax_old.tex
347 lines (268 loc) · 17.4 KB
/
syntax_old.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
\chapter{Context Free Syntax}
This chapter details the first part of the static semantics. In order for any context sensitive checks or type inference can be done, the test programs first need to be parsed into an abstract syntax tree. The context sensitive checks and type inference can then be performed upon the tree.
Some difficulties with implementing a grammar into K is that the grammar originally is written in sort descending order in a document. The goal was to build a grammar that can parse actual programs and ensure there was no bugs. To do this, I started with small example programs, wrote out the example abstract syntax tree, and included the sorts necessary to parse them. Then if they didn't parse correctly I could then debug. I then wrote bigger and bigger example programs and included more and more sorts until all the grammar was included.
The Haskell 2010 report is the current official specification of the Haskell language.
\url{https://www.haskell.org/onlinereport/haskell2010/}
The grammar specified in section 10.5 of the Haskell 2010 report is a specification of the expanded syntax of Haskell.
\url{https://www.haskell.org/onlinereport/haskell2010/haskellch2.html#x7-210002.7}
As specified in section 2.7, the expanded syntax of Haskell specifies Haskell programs when written using semicolons and braces. However, these can be omitted in a real Haskell program. The compiler will then utilize layout rules for certain grammar structures instead. These are specified in section 10.3.
\url{https://www.haskell.org/onlinereport/haskell2010/haskellch10.html#x17-17800010.3}
The parser for this project does not implement these layout rules and instead only can parse the expanded, layout insensitive syntax of Haskell. It would require another script to convert a program written using the layout sensitive syntax into the expanded syntax in order to parse the program.
\url{https://www.haskell.org/onlinereport/haskell2010/haskellch10.html#x17-18000010.5}
Section 10.1 specifies the notation used in the grammar.
\url{https://www.haskell.org/onlinereport/haskell2010/haskellch10.html#x17-17600010.1}
The notation of 10.1 are always in bold in the grammar.
So an example production in the document looks like
\texttt{qvarid -> \textbf{[} modid . \textbf{]} varid}
This means that
\begin{lstlisting}
modid .
\end{lstlisting}
is optional, and the brackets are not terminals, but the period is a terminal. Any symbol that is not in bold needs to be written in the program in order to parse correctly.
\section{Syntax Explanation}
There are a lot of parts of the grammar that proposed many challenges to implement in K. For instance, a sort definition that includes an option could be just written using a pipe
in the K syntax.
So the example production
\texttt{qvarid -> \textbf{[} modid . \textbf{]} varid}
Is written in the K syntax as split into two options.
\begin{lstlisting}
syntax QVarId ::= VarId | ModId "." VarId [klabel('qVarIdCon)]
\end{lstlisting}
However, an issue arises when you have something written on the right hand side of the production like
\texttt{data \textbf{[} context => \textbf{]} simpletype \textbf{[} = constrs \textbf{]} \textbf{[} deriving \textbf{]}}
for the topdecl sort. If each optional sort were split into two options, then a production that includes $n$ optional sorts would require $2^n$ options. This would create an unnecessarily large syntax in K.
Instead, for each optional sort in the original grammar, I replaced the optional sort with a new sort. For instance, I replaced
\texttt{ \textbf{[} context => \textbf{]} }
in the original grammar with a new sort called OptContext. Then I just specified
\begin{lstlisting}
syntax OptContext ::= Context "=>" | "" [onlyLabel, klabel('emptyContext)]
\end{lstlisting}
and so the right hand side of the production would now look like
\begin{lstlisting}
"data" OptContext SimpleType OptConstrs OptDeriving [klabel('data)]
\end{lstlisting}
This is acceptable because Haskell is not an order sorted algebra, so introducing new sorts that are not originally in the grammar is okay.
In K, semantic rules are called K rules. The tag
\begin{lstlisting}
[klabel('exampleLabel)]
\end{lstlisting}
means that in the abstract syntax tree created by K, a term can be referred to using that klabel in a K rule.
\section{Implementation of Section 10.2}
The following introduces the syntax for the keywords, constants, special symbols, and variables that comprise the terminals for the remaining context free grammar as presented in section 10.2 of the 2010 Haskell report.
\begin{lstlisting}
// Syntax from haskell 2010 Report
// https://www.haskell.org/onlinereport/haskell2010/haskellch10.html#x17-17500010
module HASKELL-SYNTAX
syntax Integer ::= Token{([0-9]+)
| (([0][o]|[0][O])[0-7]+)
| (([0][x] | [0][X])[0-9a-fA-F]+)} [onlyLabel]
syntax CusFloat ::= Token{([0-9]+[\.][0-9]+([e E][\+\-]?[0-9]+)?)
|([0-9]+[e E][\+\-]?[0-9]+)} [onlyLabel]
syntax CusChar ::= Token{[\'](~[\'\\\&])[\']} [onlyLabel]
syntax CusString ::= Token{[\"](~[\"\\]*)[\"]} [onlyLabel]
\end{lstlisting}
I ran into issue where a program with a variable called ‘size’ did not parse. I found out that this is because ‘size’ is a K keyword. So I just specified that a variable could be a variable token, or “size”. This reveals an issue in the version of K I was using, where K keywords that appear in example parsed programs won't become tokens.
\begin{lstlisting}
syntax VarId ::= Token{[a-z\_][a-z A-Z\_0-9\']*} [onlyLabel] | "size" [onlyLabel]
\end{lstlisting}
\begin{lstlisting}
syntax ConId ::= Token{[A-Z][a-zA-Z \_0-9\']*} [onlyLabel]
syntax VarSym ::= Token{
([\! \# \$ \% \& \* \+ \/ \> \? \^][\! \# \$ \% \& \* \+ \. \/ \< \= \> \? \@ \\ \^ \| \- \~ \:]*)
|[\-] | [\.]
|([\.][\! \# \$ \% \& \* \+ \/ \< \= \> \? \@ \\ \^ \| \- \~ \:][\! \# \$ \% \& \* \+ \. \/ \< \= \> \? \@ \\ \^ \| \- \~ \:]*)
| ([\-][\! \# \$ \% \& \* \+ \. \/ \< \= \? \@ \\ \^ \| \~ \:][\! \# \$ \% \& \* \+ \. \/ \< \= \> \? \@ \\ \^ \| \~ \:]*)
| ([\@][\! \# \$ \% \& \* \+ \. \/ \< \= \> \? \@ \\ \^ \| \- \~ \:]+)
| ([\~][\! \# \$ \% \& \* \+ \. \/ \< \= \> \? \@ \\ \^ \| \- \~ \:]+)
| ([\\][\! \# \$ \% \& \* \+ \. \/ \< \= \> \? \@ \\ \^ \| \- \~ \:]+)
| ([\|][\! \# \$ \% \& \* \+ \. \/ \< \= \> \? \@ \\ \^ \| \- \~ \:]+)
| ([\:][\! \# \$ \% \& \* \+ \. \/ \< \= \> \? \@ \\ \^ \| \- \~][\! \# \$ \% \& \* \+ \. \/ \< \= \> \? \@ \\ \^ \| \- \~ \:]*)
| ([\<][\! \# \$ \% \& \* \+ \. \/ \< \= \> \? \@ \\ \^ \| \~ \:][\! \# \$ \% \& \* \+ \. \/ \< \= \> \? \@ \\ \^ \| \- \~ \:]*)
| ([\=][\! \# \$ \% \& \* \+ \. \/ \< \= \? \@ \\ \^ \| \~ \:][\! \# \$ \% \& \* \+ \. \/ \< \= \> \? \@ \\ \^ \| \- \~ \:]*)} [onlyLabel]
syntax ConSym ::= Token{[\:][\! \# \$ \% \& \* \+ \. \/ \< \= \> \? \@ \\ \^ \| \- \~][\! \# \$ \% \& \* \+ \. \/ \< \= \> \? \@ \\ \^ \| \- \~ \:]*} [onlyLabel]
\end{lstlisting}
I ran into an issue where floats and integers did not parse correctly. They caused parsing errors due to ambiguity of parsing. For example the number 123.45 had ambiguity where the parser did not know if 1, 12, or 123 where integers, and if 5 was an integer, or if the entire thing was one float. Normally in K, different tokens are separated with whitespaces. However, for some reason the parser had difficulty here. Initially, I added a workaround by requiring parentheses around each integer and floating point.
This fixed the issue.
\begin{lstlisting}
syntax IntFloat ::= "(" Integer ")" [bracket] //NOT OFFICIAL SYNTAX
| "(" CusFloat ")" [bracket]
\end{lstlisting}
\begin{lstlisting}
syntax Literal ::= IntFloat | CusChar | CusString
syntax TyCon ::= ConId
syntax ModId ::= ConId | ConId "." ModId [klabel('conModId)]
syntax QTyCon ::= TyCon | ModId "." TyCon [klabel('conTyCon)]
syntax QVarId ::= VarId | ModId "." VarId [klabel('qVarIdCon)]
syntax QVarSym ::= VarSym | ModId "." VarSym [klabel('qVarSymCon)]
syntax QConSym ::= ConSym | ModId "." ConSym [klabel('qConSymCon)]
syntax TyVars ::= List{TyVar, ""} [klabel('typeVars)] //used in SimpleType syntax
syntax TyVar ::= VarId
syntax TyVarTuple ::= TyVar "," TyVar [klabel('twoTypeVarTuple)]
| TyVar "," TyVarTuple [klabel('typeVarTupleCon)]
syntax Con ::= ConId | "(" ConSym ")" [klabel('conSymBracket)]
syntax Var ::= VarId | "(" VarSym ")" [klabel('varSymBracket)]
syntax QVar ::= QVarId | "(" QConSym ")" [klabel('qVarBracket)]
syntax QCon ::= QTyCon | "(" GConSym ")" [klabel('gConBracket)]
syntax QConOp ::= GConSym | "`" QTyCon "`" [klabel('qTyConQuote)]
syntax QVarOp ::= QVarSym | "`" QVarId "`" [klabel('qVarIdQuote)]
syntax VarOp ::= VarSym | "`" VarId "`" [klabel('varIdQuote)]
syntax ConOp ::= ConSym | "`" ConId "`" [klabel('conIdQuote)]
syntax GConSym ::= ":" | QConSym
syntax Vars ::= Var
| Var "," Vars [klabel('varCon)]
syntax VarsType ::= Vars "::" Type [klabel('varAssign)]
syntax Ops ::= Op
| Op "," Ops [klabel('opCon)]
syntax Fixity ::= "infixl" | "infixr" | "infix"
syntax Op ::= VarOp | ConOp
syntax CQName ::= Var | Con | QVar
/* syntax QConId ::= ConId | ModId "." ConId */
syntax QOp ::= QVarOp | QConOp
\end{lstlisting}
\section{Implementation of Section 10.5}
The following introduces the sorts of the context free grammar of the Haskell extended syntax.
\subsection{Modules}
We start with modules.
\begin{lstlisting}
syntax ModuleName ::= "module" ModId [klabel('moduleName)]
syntax Module ::= ModuleName "where" Body [klabel('module)]
| ModuleName Exports "where" Body [klabel('moduleExp)]
| Body [klabel('moduleBody)]
syntax Body ::= "{" ImpDecls ";" TopDecls "}" [klabel('bodyimpandtop)]
| "{" ImpDecls "}" [klabel('bodyimpdecls)]
| "{" TopDecls "}" [klabel('bodytopdecls)]
\end{lstlisting}
The sort that contains all the other sorts is a module. A module represents one complete Haskell program. It can have either a name and a body, a name and a body with exports, or just a body.
\subsection{ImpDecls}
An ImpDecl is an import declaration. An import is another module that this module depends on.
The definition of ImpDecl is the following
\begin{lstlisting}
syntax ImpDecls ::= List{ImpDecl, ";"} [klabel('impDecls)]
syntax ImpDecl ::= "import" OptQualified ModId OptAsModId OptImpSpec [klabel('impDecl)]
| "" [onlyLabel, klabel('emptyImpDecl)]
syntax OptQualified ::= "qualified"
| "" [onlyLabel, klabel('emptyQualified)]
syntax OptAsModId ::= "as" ModId
| "" [onlyLabel, klabel('emptyOptAsModId)]
syntax OptImpSpec ::= ImpSpec
| "" [onlyLabel, klabel('emptyOptImpSpec)]
syntax ImpSpecKey ::= "(" ImportList OptComma ")"
syntax ImpSpec ::= ImpSpecKey
| "hiding" ImpSpecKey
syntax ImportList ::= List{Import, ","}
syntax Import ::= Var
| TyCon CQList
\end{lstlisting}
The following example program has a Module with a name and a body of only ImpDecls. It has one ImpDecl.
\begin{lstlisting}
module Foo where
{import Bar
}
\end{lstlisting}
The following example program has a Module with no name and and a body of only ImpDecls. It has one ImpDecl.
\begin{lstlisting}
import Bar
\end{lstlisting}
Another example program is
\begin{lstlisting}
module Simp1 where
{import Test
}
\end{lstlisting}
The corresponding abstract syntax tree in K is
\begin{lstlisting}
`'module`(`'moduleName`(#token("Simp1","ConId")), `'bodyimpdecls`(`'impDecls`(`'impDecl`(`'emptyQualified`( .::KList), #token("Test","ConId"), `'emptyOptAsModId`(.::KList), `'emptyOptImpSpec`(.::KList)), `'.List{"'impDecls"}`(.::KList))))
\end{lstlisting}
Module contains two children. The first child of module is moduleName which contains the token Simp1. Simp1 is a constructor ID because it starts with a capital letter. The second child of module is bodyimpdecls. This contains the sort impDecls which is a list of impDecls. This program has only one impDecl. Since there is no qualified, the impDecl contains the child emptyQualified, followed by the token Test. Since there is no AsModId or ImpSpec, the last two children are emptyOptAsModId and emptyOptImpSpec.
\subsection{TopDecls}
The main types of expressions in Haskell are TopDecls - Top Declarations. A top declaration can either be a type, a data, a newtype, a class, an instance, a default, a foreign, or an arbitrary declaration.
Any sort that starts with 'Opt' means that this is optional. In K something can be made optional by declaring the necessary constructors or nothing.
\begin{lstlisting}
syntax TopDecls ::= List{TopDecl, ";"} [klabel('topdeclslist)]
syntax TopDecl ::= Decl [klabel('topdecldecl)]
> "type" SimpleType "=" Type [klabel('type)]
| "data" OptContext SimpleType OptConstrs OptDeriving [klabel('data)]
| "newtype" OptContext SimpleType "=" NewConstr OptDeriving [klabel('newtype)]
| "class" OptContext ConId TyVar OptCDecls [klabel('class)]
| "instance" OptContext QTyCon Inst OptIDecls [klabel('instance)]
| "default" Types [klabel('default)]
| "foreign" FDecl [klabel('foreign)]
\end{lstlisting}
\subsection{Decls}
A Decl is any general declaration. So something like
\begin{lstlisting}
f x = x + 2
\end{lstlisting}
is a Decl.
The following is the definition of Decl and related sorts.
\begin{lstlisting}
syntax OptDecls ::= "where" Decls | "" [onlyLabel, klabel('emptyOptDecls)]
syntax Decls ::= "{" DeclsList "}" [klabel('decls)]
syntax DeclsList ::= List{Decl, ";"} [klabel('declsList)]
syntax Decl ::= GenDecl
| FunLhs Rhs [klabel('declFunLhsRhs)]
| Pat Rhs [klabel('declPatRhs)]
\end{lstlisting}
This section introduces the sorts whose parent is Decl. This contains general declarations, function left hand side and right hand side, function guards, and expressions.
\begin{lstlisting}
syntax GenDecl ::= VarsType
| Vars "::" Context "=>" Type [klabel('genAssignContext)]
| Fixity Ops
| Fixity Integer Ops
| "" [onlyLabel, klabel('emptyGenDecl)]
syntax FunLhs ::= Var APatList [klabel('varApatList)]
| Pat VarOp Pat [klabel('patVarOpPat)]
| "(" FunLhs ")" APatList [klabel('funlhsApatList)]
syntax Rhs ::= "=" Exp OptDecls [klabel('eqExpOptDecls)]
| GdRhs OptDecls [klabel('gdRhsOptDecls)]
syntax GdRhs ::= Guards "=" Exp
| Guards "=" Exp GdRhs
syntax Guards ::= "|" GuardList
syntax GuardList ::= Guard | Guard "," GuardList [klabel('guardListCon)]
syntax Guard ::= Pat "<-" InfixExp
| "let" Decls
| InfixExp
//definition of exp
syntax Exp ::= InfixExp
> InfixExp "::" Type [klabel('expAssign)]
| InfixExp "::" Context "=>" Type [klabel('expAssignContext)]
syntax InfixExp ::= LExp
> "-" InfixExp [klabel('minusInfix)]
> LExp QOp InfixExp
\end{lstlisting}
\subsection{LExp}
LExp is an important sort for the type inference function. This is because LExp defines the different expression types which the type inference function has specific rules for.
The different LExp types are a lambda expression, a 'let-in' expression, an 'if' statement, a case statement, and a 'do' block.
\begin{lstlisting}
syntax LExp ::= AExp
> "\\" APatList "->" Exp [klabel('lambdaFun)]
| "let" Decls "in" Exp [klabel('letIn)]
| "if" Exp OptSemicolon "then" Exp OptSemicolon "else" Exp [klabel('ifThenElse)]
| "case" Exp "of" "{" Alts "}" [klabel('caseOf)]
| "do" "{" Stmts "}" [klabel('doBlock)]
\end{lstlisting}
\subsection{AExp}
AExp is also an import sort for the type inference function. The main parts of AExp that the inference function cares about is QVar and GCon.
QVar is a qualified variable and GCon is a general constructor.
\begin{lstlisting}
syntax OptSemicolon ::= ";" | "" [onlyLabel, klabel('emptySemicolon)]
syntax OptComma ::= "," | "" [onlyLabel, klabel('emptyComma)]
syntax AExp ::= QVar [klabel('aexpQVar)]
| GCon [klabel('aexpGCon)]
| Literal [klabel('aexpLiteral)]
> AExp AExp [left, klabel('funApp)]
> QCon "{" FBindList "}"
| AExp "{" FBindList "}" //aexp cannot be qcon UNFINISHED
//Liyi: first, does not understand the syntax, it is the Qcon {FBindlist}
//or QCon? Second, place a check in preprosssing.
//and also check the Fbindlist here must be at least one argument
> "(" Exp ")" [bracket]
| "(" ExpTuple ")"
| "[" ExpList "]"
| "[" Exp OptExpComma ".." OptExp "]"
| "[" Exp "|" Quals "]"
| "(" InfixExp QOp ")"
| "(" QOp InfixExp ")" //qop cannot be - (minus) UNFINISHED
//Liyi: place a check here to check if QOp is a minus
\end{lstlisting}
\section{Example Test Programs}