Custom arena allocator #49

overlookmotel · 2024-06-21T11:11:59Z

I feel we should investigate replacing Bumpalo with a custom allocator. Notes on why, and some ideas:

Make it faster - remove alignment calculations

Oxc's AST has a property which is unusual. Every single AST type is aligned on 8 (because they all contain a Vec, Box, Atom or &str).

We can exploit this property to remove alignment calculations when allocating into the arena. The bump pointer is always aligned on 8 after the last allocation, so no need to re-align each time you allocate.

This will only remove a couple of instructions (and possibly one branch), but given that allocating is a very hot path, it might well make a measurable difference.

Bumpalo is designed for a general use case, and so cannot perform this optimization.

NB: There are 2 exceptions to the "everything aligned on 8" rule:

String data is aligned on 1. Either store strings in a separate memory space, or at start of arena chunks (see below), or align them on 8 manually.
BooleanLiteral is aligned on 4. Easy fix by making it #[repr(align(8)].

Reference: ser_raw performs this optimization.

Make it faster - alternative design

Blink allocator claims to be much faster than Bumpalo. There was an attempt to replace Bumpalo with Blink in Oxc last year oxc-project/oxc#106, but it didn't show performance improvements. Nonetheless, I think it's worthwhile investigating why and whether anything we can learn from Blink.

I have some questions myself about Bumpalo's design. I don't understand why it uses double-indirection (pointer to a pointer) to get to the current chunk cursor. I wonder if reducing indirection could yield a speed boost.

Some other possible optimizations here: fitzgen/bumpalo#234

Fill chunks from both ends

Fill from end for AST nodes (bumping downwards, same as Bumpalo does).
Fill from start for strings (bumping upwards).

#37

This would also have the nice property of grouping all string data together in one place in memory, which is probably advantageous for AST transfer.

Align chunks on 4 GiB

AST transfer can work with the existing allocator, but it can be much faster if chunks are always allocated on 4 GiB boundaries. JS's Number cannot contain a u64, but if chunks are aligned on 4 GiB and there is only 1 chunk, only the bottom 32 bits of pointers are relevant, so can do pointer offset maths with 32 bit ints, which is faster.

fitzgen declined to support this in Bumpalo. fitzgen/bumpalo#237

Cap max size at 4 GiB

In environments with virtual memory (all the targets Oxc supports except WASM), allocating a large chunk of memory only consumes virtual memory space. Actual memory pages are only consumed when they are written to.

So always allocate chunks as 4 GiB size (except in WASM).

If it's possible to impose a size limit of 4 GiB on ASTs (#27), then we only ever need 1 chunk, which would:

Massively simplify and streamline the allocator.
That simplicity might help the compiler see that AST nodes can be constructed directly in arena, rather than constructing on stack and then memcpy into arena.
Allow completely branchless allocation (fitzgen's suggestion).
Open the door to storing pointers as u32s in Box, Vec and Atom (like V8 does - pointer compression).

If 4 GiB size limit for ASTs is not viable, we could at least limit each individual chunk to 4 GiB. This seems like a viable limit - would cap Vec<Statement> to max 268 million entries ((1 << 32) / 16). I think we can assume even the largest application, bundled into a single file, will not contain more than 268 million top level statements?!

Either way, this would allow reducing Vec to 16 bytes (#18).

The text was updated successfully, but these errors were encountered:

Boshen · 2024-06-21T12:14:22Z

I propose we fork bumpalo and create a bunch of tiny issues on the fork for people to experiment with, may be able to draw a few Rust nerds.

overlookmotel · 2024-06-21T12:37:18Z

I propose we fork bumpalo and create a bunch of tiny issues on the fork for people to experiment with, may be able to draw a few Rust nerds.

I like that idea.

By the way, I didn't intend to do anything on allocator right now, just wanted to write up my thoughts before I forget them all.

Also, the eventual allocator may end up being quite different from Bumpalo. Possible it may end up being more like Blink, or maybe even something entirely new. I had thought first of all fork Bumpalo, cut down its API surface to the minimum, then iterate and experiment.

overlookmotel · 2024-06-21T12:46:11Z

This also looks interesting:

https://github.com/pcwalton/offset-allocator
https://news.ycombinator.com/item?id=40220542

overlookmotel · 2024-10-18T09:05:30Z

https://crates.io/crates/bump-scope looks good. As well as support for scopes (which could be useful for #121), also appears to produce more compact assembly than Bumpalo (only 1 branch in allocation, rather than Bumpalo's 3 branches).

Boshen · 2024-10-18T11:10:28Z

https://crates.io/crates/bump-scope looks good. As well as support for scopes (which could be useful for #121), also appears to produce more compact assembly than Bumpalo (only 1 branch in allocation, rather than Bumpalo's 3 branches).

I reached out bluurryy/bump-scope#37

overlookmotel · 2024-10-18T16:32:45Z

Cool. Will be great to see if they're interested.

However, I still think we would benefit from our own custom allocator eventually. I made comment above more as a "note to self" to look into how bump-scope works, to learn from it, than thinking we should adopt it.

bump-scope has some of the optimizations I had in mind (e.g. minimum alignment to remove alignment calculations), but not others - filling the chunks from both ends, and aligning chunks, which is required to get Vec down to 16 bytes (#18 (comment)). Unless maybe bump-scope wants to support these features?

Boshen mentioned this issue Jun 25, 2024

perf(diagnostics): optimize string-buffer reallocations oxc-project/oxc#3897

Merged

Boshen added the help wanted Extra attention is needed label Jun 26, 2024

Boshen mentioned this issue Jun 28, 2024

fix(oxlint): gate custom allocators by feature flag oxc-project/oxc#3945

Merged

overlookmotel removed the help wanted Extra attention is needed label Oct 17, 2024

Boshen mentioned this issue Oct 18, 2024

Switch bumaplo with bump-scope in oxc bluurryy/bump-scope#37

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom arena allocator #49

Custom arena allocator #49

overlookmotel commented Jun 21, 2024

Boshen commented Jun 21, 2024 •

edited

Loading

overlookmotel commented Jun 21, 2024 •

edited

Loading

overlookmotel commented Jun 21, 2024

overlookmotel commented Oct 18, 2024

Boshen commented Oct 18, 2024

overlookmotel commented Oct 18, 2024

Custom arena allocator #49

Custom arena allocator #49

Comments

overlookmotel commented Jun 21, 2024

Make it faster - remove alignment calculations

Make it faster - alternative design

Fill chunks from both ends

Align chunks on 4 GiB

Cap max size at 4 GiB

Boshen commented Jun 21, 2024 • edited Loading

overlookmotel commented Jun 21, 2024 • edited Loading

overlookmotel commented Jun 21, 2024

overlookmotel commented Oct 18, 2024

Boshen commented Oct 18, 2024

overlookmotel commented Oct 18, 2024

Boshen commented Jun 21, 2024 •

edited

Loading

overlookmotel commented Jun 21, 2024 •

edited

Loading