forked from blynn/compiler
-
-
Notifications
You must be signed in to change notification settings - Fork 0
/
golf.lhs
247 lines (194 loc) · 9.23 KB
/
golf.lhs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
= Combinator Golf =
Gregory Chaitin's pioneering results on
https://en.wikipedia.org/wiki/Algorithmic_information_theory[algorithmic
information theory] are exciting because he wrote real programs to make theory
concrete. Sadly, Chaitin chose LISP due to a flawed understanding of lambda
calculus, diminishing the beauty of his results.
He's not alone.
https://queue.acm.org/detail.cfm?id=1039523[Alan Kay described LISP as the
"Maxwell's equations of software"] because a "half page of code" described LISP
itself. Surely the equally powerful lambda calculus, which can describe itself
in far less space, is more deserving of the title.
http://www.paulgraham.com/rootsoflisp.html[Paul Graham's 'The Roots of Lisp']
dubs the LISP self-interpreter "The Surprise" because of its supposed brevity.
What, then, should one call a one-line lambda calculus self-interpereter?
(Aside: on the other hand, the analogy with Maxwell's equations is apt, because
if Maxwell had used link:../haskell/ga.html[geometric algebra] instead of
vector algebra, he could have been less verbose and we would be celebrating
Maxwell's 'equation', that is, one equation instead of four:
\[
(\partial_t + \nabla)(\mathbf{e} + \mathbf{B}) = 0
\]
in a vacuum, and more generally:
\[
(\partial_t + \nabla)(\mathbf{e} + \mathbf{B}) = \rho + \mathbf{j}
\]
where bivectors represent the magnetic field $\mathbf{B}$.)
https://tromp.github.io/cl/LC.pdf[John Tromp's reworking of Chaitin's ideas in
lambda calculus and combinatory logic] is a fascinating read. Much of the fun
involves tiny self-interpreters that read binary encodings of themselves.
Tromp's binary CL self-interpreter expects the input in the form of nested
pairs such as:
------------------------------------------------------------------------
(True, (False, (False, ... (True, const) ... ))
------------------------------------------------------------------------
where the booleans and pairs have their usual Church/Scott encodings.
If we could disable type-checking in Haskell, the interpreter is simply:
------------------------------------------------------------------------
eval c = uncurry (bool (uncurry (c . bool const ap)) (eval (eval . (c .))))
------------------------------------------------------------------------
and satisfies:
------------------------------------------------------------------------
eval c (encode m n) == c m n
------------------------------------------------------------------------
where:
------------------------------------------------------------------------
data CL = App CL CL | S | K
encode m n = case m of
x :@ y -> (True, (encode x (encode y n)))
S -> (False, (False, n))
K -> (False, (True , n))
------------------------------------------------------------------------
Some of our type-challenged compilers will happily run this crazy code.
link:grind.html[Our "Fixity" compiler] accepts the following:
------------------------------------------------------------------------
data Bool = True | False;
id x = x;
const x y = x;
ap x y z = x z(y z);
bool x y b = case b of { True -> y ; False -> x };
uncurry f x = case x of { (,) a b -> f a b };
(.) f g x = f (g ( x));
encode m n = case m of
{ S -> (False, (False, n))
; K -> (False, (True , n))
; App x y -> (True, (encode x (encode y n)))
};
eval c = uncurry (bool (uncurry (c . bool ap const)) (eval (eval . (c .))));
go s = eval id (encode (App (App S K) K) s);
------------------------------------------------------------------------
Since SKK is the identity, the above program just returns the input.
Kiselyov's bracket abstraction algorithm leads us to wonder: why limit
ourselves to S and K? After all, in lambda calculus, we could prohibit terms
with a De Bruijn index greater than 2 and retain the same computing power, but
nobody bothers. Why insist on two particular combinators?
== Hole in one ==
With no restrictions, adding combinators to the definition of CL to obtain a
smaller self-interpreter is too easy. In the extreme, we could take a
https://www.ece.cmu.edu/~ganger/712.fall02/papers/p761-thompson.pdf["trusting
trust"] approach and define a combinator X which decodes a given binary string
to a CL term and interprets it.
The set of all combinators is X, S, and K. Representing X with 1 bit trivially
results in a 1-bit self-interpreter.
== Rule of three ==
Perhaps it's reasonable to say a combinator is eligible if it is equivalent to
a closed lambda term with at most 3 lambda abstractions and at most 3
applications. That is, every combinator is at most as complex as the S
combinator.
We choose the following 6 combinators:
------------------------------------------------------------------------
Sxyz = xz(yz)
Bxyz = x (yz)
Cxyz = xz(y )
Kxy = x
Txy = yx
Vxyz = zxy
------------------------------------------------------------------------
We encode them in binary according to the following table (where the backquote
represents application):
------------------------------------------------------------------------
` 1
B 01
V 0011
T 0010
S 0001
C 00001
K 00000
------------------------------------------------------------------------
Then the following is a binary self-interpreter:
------------------------------------------------------------------------
f c = T(V(T(V(T(T.V(V(T(c.V K C))(c S))(c.V T V)))(c B)))(f(f.(c.))))
------------------------------------------------------------------------
Kiselyov's algorithm yields:
------------------------------------------------------------------------
`Y``B`BT``B`S``BV``BT``S``BV``BT``B`BT``S``BV``S``BV``BT``CB``VKC`TS``CB``VTV`TB``SB``C``BBBB
------------------------------------------------------------------------
We define the Y combinator with:
------------------------------------------------------------------------
Y = ``B``TT``CB``STT
------------------------------------------------------------------------
so our self-interpreter takes 232 bits, beating Tromp's Theorem 4 record of 263
bits.
The following demonstrates this self-interpreter in our "Fixity" compiler:
------------------------------------------------------------------------
data Bool = True | False;
id x = x;
const x y = x;
flip x y z = x z y;
ap x y z = x z(y z);
bool x y b = case b of { True -> y ; False -> x };
uncurry f x = case x of { (,) a b -> f a b };
(.) f g x = f(g(x));
data CL = K | C | S | T | V | B | App CL CL;
encode m n = case m of
{ K -> (False, (False, (False, (False, (False, n)))))
; C -> (False, (False, (False, (False, (True , n)))))
; S -> (False, (False, (False, (True , n))))
; T -> (False, (False, (True , (False, n))))
; V -> (False, (False, (True , (True , n))))
; B -> (False, (True , n))
; App x y -> (True, (encode x (encode y n)))
};
t = uncurry;
v = bool;
eval c = t(v(t(v(t(t . v(v(t(c . v const flip))(c ap))(c . v t v)))(c(.)))) (eval(eval . (c .))));
demo _ = eval id (encode (App T (App K (App (App S K) K))) ("one", "two"));
------------------------------------------------------------------------
The term `T(K(SKK))` returns the second element of a Scott-encoded pair, so
the above computes `snd ("one", "two")`.
Is it sporting to consider the Y combinator primitive? If so, we could likely
shrink the interpreter further.
== Instant REPL play ==
Interactive REPLs such as GHCi shine when we want to ask the computer for help
while reducing the number of dreaded edit-compile-run cycles.
We still write code in a file, but once done, we play around in the REPL to
print codes and sizes.
\begin{code}
import Data.List
import qualified Data.Map as M
import Data.Map (Map, (!))
import Data.Ord
import Data.Tree
coms = "```B``TT``CB``STT``B`BT``B`S``BV``BT``S``BV``BT``B`BT``S``BV``S``BV``BT``CB``VCK`TS``CB``VTV`TB``SB``C``BBBB"
histo = M.fromListWith (+) $ (\c -> ([c], 1)) <$> coms
huff :: Map String Int -> [(String, [Int])]
huff ps = huff' [] $ huffTree $ (\(k, v) -> Node (k, v) []) <$> M.assocs ps
where
huff' s (Node (k, _) []) = [(k, s)]
huff' s (Node _ [x, y]) = huff' (0:s) x ++ huff' (1:s) y
huffTree [p] = p
huffTree ps = huffTree $ Node ("", v0 + v1) [p0, p1]:ps2 where
p0@(Node (_, v0) _) = getMin ps
ps1 = delete p0 ps
p1@(Node (_, v1) _) = getMin ps1
ps2 = delete p1 ps1
getMin = minimumBy $ comparing $ \(Node (_, v) _) -> v
total = foldr (\(c, enc) n -> n + length enc * histo!c) 0 $ huff histo
\end{code}
== Infinite regress ==
Is mainstream mathematics mistaken in its handling of the infinite?
https://web.math.princeton.edu/\~nelson/papers/warn.pdf[What does it mean,
Edward Nelson asks, to treat the unfinished as finished]?
We might get a theory that is internally consistent, but so what? Good stories
that have no bearing on reality are also internally consistent. Perhaps
http://sites.math.rutgers.edu/~zeilberg/Opinion125.html[undecidability is
meaningless, as Doron Zeilberger spiritedly opines].
If this turns out to be the case, then I'll be annoyed because of the time I
spent learning a lot of this stuff. But no matter what, it'll always be fun to
seek tinier self-interpreters!
Turing machine golf is also possible, though more painful.
https://www.scottaaronson.com/busybeaver.pdf[Yedidia and Aaronson find small
Turing machines] related to Busy Beaver numbers, the Goldbach conjecture, and
the Riemann hypothesis, which were
https://www.scottaaronson.com/blog/?p=2741[later shrunk further].
Porting combinators to Turing machines may compare well with their approach.