New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[RFC007] Bytecode interpreter #2045

Open

yannham wants to merge 20 commits into master from rfc/bytecode-vm

Member

yannham commented Sep 17, 2024

Although the name is a bit pompous, the goal of this RFC is mostly to be a working document for designing a more compact and efficient run-time representation for Nickel expressions.

While this is something that won't be user-facing (at least in a direct way), and thus can be changed later without breaking backward-compatibility, I think the technical scope of this effort is such that I find it better to discuss it formally here before going for a first implementation.


          First draft (incomplete) of RFC007

f038447

github-actions bot temporarily deployed to pull request

September 17, 2024 15:08

Inactive

Contributor

github-actions bot commented Sep 17, 2024 •

edited

Loading

Bencher Report

Branch	2045/merge
Testbed	ubuntu-latest

⚠️ WARNING: The following Measure does not have a Threshold. Without a Threshold, no Alerts will ever be generated!
Latency
Click here to create a new Threshold
For more information, see the Threshold documentation.
To only post results if a Threshold exists, set the --ci-only-thresholds CLI flag.

Click to view all benchmark results

Benchmark	Latency	nanoseconds (ns)
fibonacci 10	📈 view plot ⚠️ NO THRESHOLD	485,840.00
foldl arrays 50	📈 view plot ⚠️ NO THRESHOLD	1,805,200.00
foldl arrays 500	📈 view plot ⚠️ NO THRESHOLD	6,850,000.00
foldr strings 50	📈 view plot ⚠️ NO THRESHOLD	7,166,100.00
foldr strings 500	📈 view plot ⚠️ NO THRESHOLD	62,580,000.00
generate normal 250	📈 view plot ⚠️ NO THRESHOLD	45,525,000.00
generate normal 50	📈 view plot ⚠️ NO THRESHOLD	2,089,000.00
generate normal unchecked 1000	📈 view plot ⚠️ NO THRESHOLD	3,370,000.00
generate normal unchecked 200	📈 view plot ⚠️ NO THRESHOLD	746,970.00
pidigits 100	📈 view plot ⚠️ NO THRESHOLD	3,209,200.00
pipe normal 20	📈 view plot ⚠️ NO THRESHOLD	1,495,000.00
pipe normal 200	📈 view plot ⚠️ NO THRESHOLD	10,120,000.00
product 30	📈 view plot ⚠️ NO THRESHOLD	827,560.00
scalar 10	📈 view plot ⚠️ NO THRESHOLD	1,509,600.00
sum 30	📈 view plot ⚠️ NO THRESHOLD	826,840.00

🐰 View full continuous benchmarking report in Bencher


          Start writing about OCaml abstract machine

ca3ea52

github-actions bot temporarily deployed to pull request

September 19, 2024 16:26

Inactive


          Add some criterion on the VM comparisons

fead82f

github-actions bot temporarily deployed to pull request

September 20, 2024 14:58

Inactive


          Complete the OCaml VM description

55c763f

github-actions bot temporarily deployed to pull request

September 23, 2024 08:32

Inactive


          First bit about the Lua VM

00355f1

github-actions bot temporarily deployed to pull request

September 23, 2024 10:47

Inactive


          More on Lua VM

bd12104

github-actions bot temporarily deployed to pull request

September 23, 2024 16:39

Inactive


          Improve previous text, start V8 section

16cfc00

github-actions bot temporarily deployed to pull request

September 25, 2024 12:38

Inactive


          More on V8; timid start of Haskell

48b804c

github-actions bot temporarily deployed to pull request

September 25, 2024 16:07

Inactive


          Small chunk on Haskell/STG

6cebc7e

github-actions bot temporarily deployed to pull request

September 30, 2024 14:33

Inactive

yannham mentioned this pull request

[Performance] Reduce size of term #2022

Closed


          More content on Tvix; drafty draft of a proposal

b64c323

github-actions bot temporarily deployed to pull request

October 3, 2024 17:27

Inactive


          Pass on the whole document, more details on V8 closures

dd1d007

github-actions bot temporarily deployed to pull request

October 6, 2024 16:52

Inactive


          More on STG and Tvix, and a bit more raw notes on the proposal

3aaf162

github-actions bot temporarily deployed to pull request

October 6, 2024 21:38

Inactive


          Full pass on existing VMs, a few more raw notes on proposal

6a220c3

github-actions bot temporarily deployed to pull request

October 7, 2024 15:22

Inactive

yannham added 2 commits

October 9, 2024 18:20


          More of the proposal

6dfb394


          More proposal

58a9e91

yannham added 4 commits

October 11, 2024 15:25


          More proposal

5aea596


          More proposal

6c76a9e


          Pass on most of the proposal

08c213e


          Small pass on the STG, remove useless and vague paragraph

c2073a3

yannham marked this pull request as ready for review

October 15, 2024 13:06

yannham requested review from aspiwack and jneem

October 15, 2024 13:07

Member Author

yannham commented Oct 15, 2024 •

edited

Loading

Some parts might need refinement, but I think it's in a good shape for a first round of reviews.

jneem approved these changes

View reviewed changes

rfcs/007-bytecode-interpreter.md Outdated Show resolved Hide resolved

rfcs/007-bytecode-interpreter.md Outdated Show resolved Hide resolved


          Address review comments

0149d4c

aspiwack reviewed

View reviewed changes

Member

aspiwack left a comment

Some random comments.

rfcs/007-bytecode-interpreter.md

Comment on lines +209 to +211

+              The following notes on the memory representation applies to the native code
+              backend's representation. I'm not sure how closures are represented in the Zinc
+              Abstract Machine.

Member

aspiwack Oct 30, 2024

I'm not sure it ought to be called Zinc anymore, but anyway: I'm pretty sure that the representation of values (including closures) is the same in native and bytecode. It must be so at least to some degree for the sake of the FFI, where Ocaml values can be manipulated.

rfcs/007-bytecode-interpreter.md

Comment on lines +238 to +239

		no argument (the tag byte then doesn't store the actual contructor's tag but has
		the same value than for a boxed `int`). For a variant with parameters, the tag

Member

aspiwack Oct 30, 2024

This is not quite it. A constructor without argument is represented as an unboxed integer value. It doesn't point to a block, so there is no tag involved. Constructors with arguments are pointers, and point to a structure as above.

Member Author

yannham Oct 30, 2024

You're right, I don't know where I got this idea

rfcs/007-bytecode-interpreter.md

Comment on lines +299 to +300

		Despite not being advertised, Haskell has an interpreter as well, which is used
		mostly for the GHCi REPL. This section describes what we know of the actual the

Member

aspiwack Oct 30, 2024

An Template Haskell.

rfcs/007-bytecode-interpreter.md

Comment on lines +316 to +317

		code), such that thunk access is uniform: it's an unconditional jump to the
		corresponding code.

Member

aspiwack Oct 30, 2024

In fact, when pointing to an info table, you actually point directly to the code pointer. The metadata in the info table is accessed backwards by subtracting from the pointer. This way entering a thunk is an indirect jump, which many processors support in a single instruction.

Member Author

yannham Oct 30, 2024

Ah, right; I think it's mentioned in the STG paper that this was difficult to do with the approach proposed there going through ANSI C, but easy to do with a custom native code generation backend, which I suppose is what GHC does today.

rfcs/007-bytecode-interpreter.md

Comment on lines +324 to +327

+              The STG paper argues that this uniform thunk representation (with "self-handled
+              update") simplifies the compilation process and gives room for some specific
+              optimizations (vectored return for pattern matching, for example) that should be
+              beneficial to Haskell programs.

Member

aspiwack Oct 30, 2024

It's worth noting that since the STG paper was written, things have become a little more complex. GHC uses some pointer tagging to mark if the thunk is already forced in the form of one of the 3 (on 32 bits) or 7 (on 64 bits) constructors of the data type. So pattern-matching will check these bits before entering the closure. For efficiency.

rfcs/007-bytecode-interpreter.md

+              environment in the case of closures). As each constructor usage potentially
+              generates very similar code, GHC is smart enough to share common constructors
+              instead of generating them again and again (typically the one for an empty
+              list).

Member

aspiwack Oct 30, 2024

An interesting difference between GHC's memory representation and Ocaml's is the way unboxed and boxed values are distinguished. This is only a concern for the GC, so maybe it doesn't matter too much for this discussion.

In Ocaml, if a value, viewed as an unsigned int, is odd, then it's unboxed (and the GC doesn't follow it), when it's even, it's a pointer to a boxed value. This way it's always evident, when looking at a value whether it's a pointer or not. The cost is that integers lose one bit, and that it becomes impossible to unbox floats in most cases.
In Haskell, pointers and non-pointer values are indistinguishable. So the GC asks the info table which of the fields are pointers. In practice, this is just a number, and all the non-pointer fields are stored first, and all the pointer fields after (or maybe the other way around, I don't remember).

rfcs/007-bytecode-interpreter.md

+              machine. Those machines still have a number of differences on how they handle
+              fundamental operations such as function application. We take inspiration from
+              the
+              [call-by-push-value](https://www.cs.bham.ac.uk/~pbl/papers/thesisqmwphd.pdf)(CBPV)

Member

aspiwack Oct 30, 2024

😊

Member Author

yannham Oct 30, 2024 •

edited

Loading

It's funny because I clearly remember you telling me something like 4 years ago that if Nickel had a VM it should probably be a CPBV one. At that time I knew CPBV rather well but probably didn't have enough fresh VM or native compiler knowledge to really connect the two or really understand what it meant, I might have said something like "ah, ok", but it stuck as an (EDIT: un)evaluated thunk in the back of my head, so to speak.

Well, it took me a bit of time but I can now finally make sense of it 🙂

Member

aspiwack Oct 30, 2024

Maybe I said that, I don't know. But I think my broader point was that I was thinking, at the time, of metadata (doc, default, …) as being attached to thunks. And CBPV helps us understand what's going on, because of thunking being an explicit operation in the language, so you can change what's going on there.

That being said, I always think of any evaluation model in terms of CBPV/polarised system L. Even when I don't present it that way: it's almost certainly a translation of the CPBV I have in my head. And I'm keen to embrace it in abstract machines as well.

Member

aspiwack Oct 30, 2024

Speaking of, you can think of system L as representing an abstract machine with a structured stack. Implemented directly, it's not as cache friendly as an actual stack machine. But maybe it's a middle ground to consider, as you don't have to come up with a complex linearised continuation representation for pattern matching, in particular.

On the other hand, maybe it's too tree-like for this proposal. I don't know, I haven't thought about it. I just said the name, and this idea popped up in response. Do whatever you want to do with it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet