Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uses blink-alloc instead of bumpalo #106

Closed
wants to merge 5 commits into from

Conversation

zakarumych
Copy link

The usage is roughly the same.

One caveat applies:
String is not generalized over allocator in std, so allocator-api2 lacks one too.

There seems to be room for perf improvement.
On my machine this change gives 2.5 - 3% speed increase

@Boshen
Copy link
Member

Boshen commented Mar 4, 2023

The test failed with

thread 'no_bloat_enum_sizes' panicked at 'assertion failed: `(left == right)`
  left: `24`,
 right: `16`', crates/oxc_ast/src/lib.rs:58:5

with

https://github.com/Boshen/oxc/blob/47c1e7ef5124b81d6df0cd8182b56a94e297a04b/crates/oxc_ast/src/lib.rs#L58

Which is fine I think, we can mark it as 24.

@Boshen
Copy link
Member

Boshen commented Mar 4, 2023

Here are the benchmarks from the CI jobs, they aren't posted by bot due to permission issues.

Parser Benchmark Results - macos-latest

group                    main                                   pr
-----                    ----                                   --
parser/babylon.max.js    1.05   192.8±50.76ms    53.6 MB/sec    1.00   182.8±19.32ms    56.5 MB/sec
parser/d3.js             1.12    25.2±32.18ms    21.7 MB/sec    1.00     22.5±4.48ms    24.3 MB/sec
parser/lodash.js         1.00      5.6±0.44ms    91.6 MB/sec    1.11      6.2±1.41ms    82.6 MB/sec
parser/pdf.js            1.00     11.5±0.69ms    34.9 MB/sec    1.14     13.1±2.60ms    30.6 MB/sec
parser/typescript.js     1.00   182.9±28.20ms    52.6 MB/sec    1.01   184.1±35.35ms    52.2 MB/sec

Parser Benchmark Results - windows-latest

group                    main                                   pr
-----                    ----                                   --
parser/babylon.max.js    1.00    138.7±3.34ms    74.4 MB/sec    1.08    150.4±2.21ms    68.7 MB/sec
parser/d3.js             1.00     16.7±0.38ms    32.8 MB/sec    1.07     17.9±0.21ms    30.6 MB/sec
parser/lodash.js         1.00      4.7±0.19ms   109.6 MB/sec    1.09      5.1±0.10ms   100.6 MB/sec
parser/pdf.js            1.00      9.4±0.21ms    42.7 MB/sec    1.09     10.3±0.23ms    39.0 MB/sec
parser/typescript.js     1.00    137.1±2.33ms    70.2 MB/sec    1.10    150.2±2.09ms    64.0 MB/sec

Parser Benchmark Results - ubuntu-latest

group                    main                                   pr
-----                    ----                                   --
parser/babylon.max.js    1.00    126.2±3.95ms    81.8 MB/sec    1.15    145.3±4.75ms    71.1 MB/sec
parser/d3.js             1.00     15.1±0.58ms    36.2 MB/sec    1.20     18.1±0.50ms    30.3 MB/sec
parser/lodash.js         1.00      4.4±0.16ms   117.8 MB/sec    1.15      5.0±0.26ms   102.9 MB/sec
parser/pdf.js            1.00      8.9±0.26ms    45.3 MB/sec    1.13     10.0±0.35ms    40.2 MB/sec
parser/typescript.js     1.00    124.0±3.70ms    77.6 MB/sec    1.18    145.9±4.53ms    66.0 MB/sec

// one just allocated from a Bump.
unsafe { ptr::read(self.0 as *mut T) }
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Box::into_inner is from std.
And Box should not have methods with self to not cause name resolution conflict with pointee.

@@ -126,7 +126,7 @@ impl Cli {
let parser_source_text = source_text.clone();
let ret = Parser::new(&allocator, &parser_source_text, source_type).parse();
let result = if ret.errors.is_empty() {
let program = allocator.alloc(ret.program);
let program = allocator.emplace_no_drop().value(ret.program);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These emplace-* APIs are pretty new to me.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blink allows dropping values placed into the allocated memory.
emplace_* configure how values are placed.

@Boshen
Copy link
Member

Boshen commented Mar 4, 2023

Wait ... it seems to be slower from the benchmarks, I didn't fully wake up yet and saw it the other way around.

}

// SAFETY: Make Bump Sync and Send, it's our responsibility to never
// simultaneously mutate across threads.
unsafe impl Send for Allocator {}
unsafe impl Sync for Allocator {}

pub type Box<'a, T> = allocator_api2::boxed::Box<T, &'a BlinkAlloc>;
pub type Vec<'a, T> = allocator_api2::vec::Vec<T, &'a BlinkAlloc>;
Copy link
Contributor

@YoniFeng YoniFeng Mar 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the main piece of the puzzle I was missing.
You can annotate the lifetime regardless of allocator_api2::vec::Vec/Box having a signature "set up for it".

Makes total sense after seeing it, but this wasn't a possibility I was aware of.

@YoniFeng
Copy link
Contributor

YoniFeng commented Mar 4, 2023

Will need to profile if we want to understand why its slower.

The internals complexity is beyond understanding with a code skim, one thing that stood out to me was was that the unbox() was just dereferencing a pointer, and now it's:

pub fn into_inner(boxed: Self) -> T {
        let ptr = boxed.0;
        let unboxed = unsafe { ptr.as_ptr().read() };
        unsafe { boxed.1.deallocate(ptr.cast(), Layout::new::<T>()) };
        unboxed
    }

Deallocate is trying to reclaim the memory if the deallocated object is the last one.
I guess after inlining this boils down to a simple add and cmp. While I don't think it's significant/can explain the worse performance overall, it is an example of a place where more work is done.

@zakarumych
Copy link
Author

I was thinking of removing deallocations or making them only for large enough blocks

@Boshen
Copy link
Member

Boshen commented Mar 4, 2023

@zakarumych Are the benchmarks wrong or is there something fishy going on?

@zakarumych
Copy link
Author

Turned out that duplicates of stds Box and Vec are problematic.
bumpalo's duplicates spray #[inline] and #[inline(always)] all over the place.
allocator-api2's duplicates are almost char-to-char copy with no additional inlining attributes.

@Boshen
Copy link
Member

Boshen commented Mar 4, 2023

Turned out that duplicates of stds Box and Vec are problematic. bumpalo's duplicates spray #[inline] and #[inline(always)] all over the place. allocator-api2's duplicates are almost char-to-char copy with no additional inlining attributes.

Is there anything I can help out with?

@Boshen
Copy link
Member

Boshen commented Mar 11, 2023

@zakarumych Friendly ping :-) May I ask you what the current status is?

@Boshen Boshen marked this pull request as draft March 11, 2023 15:18
@Boshen
Copy link
Member

Boshen commented Mar 29, 2023

Closing due to inactivity. Please reach out if you want to work on this again. And thank you for working on this, I learned a bunch from your work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants