Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(semantic): calculate number of nodes, scopes, symbols, references before visiting AST #4328

Merged
merged 20 commits into from
Jul 19, 2024

Conversation

Dunqing
Copy link
Member

@Dunqing Dunqing commented Jul 17, 2024

Note by @overlookmotel 20/7/24: Even though this PR shows as merged, that was by some weird malfunction of Github. It was immediately reverted, and the merge scrubbed from history of main branch. #4367 continues the work.

@github-actions github-actions bot added the A-semantic Area - Semantic label Jul 17, 2024
Copy link

codspeed-hq bot commented Jul 17, 2024

CodSpeed Performance Report

Merging #4328 will improve performances by 30.11%

Comparing test-visit (8a2971b) with main (f68b659)

Summary

⚡ 12 improvements
✅ 20 untouched benchmarks

Benchmarks breakdown

Benchmark main test-visit Change
minifier[antd.js] 344.3 ms 304.6 ms +13.05%
minifier[react.development.js] 3.9 ms 3.5 ms +10.46%
minifier[typescript.js] 692 ms 616.4 ms +12.27%
semantic[RadixUIAdoptionSection.jsx] 108.8 µs 102 µs +6.71%
semantic[antd.js] 175.7 ms 135 ms +30.11%
semantic[cal.com.tsx] 56.4 ms 48.8 ms +15.47%
semantic[checker.ts] 104.9 ms 83.5 ms +25.56%
semantic[pdf.mjs] 26.5 ms 22.2 ms +19.36%
transformer[antd.js] 298.5 ms 258 ms +15.7%
transformer[cal.com.tsx] 75.4 ms 65.4 ms +15.34%
transformer[checker.ts] 170.2 ms 152.1 ms +11.91%
transformer[pdf.mjs] 45.9 ms 39.9 ms +15.21%

@Dunqing Dunqing closed this Jul 17, 2024
@overlookmotel
Copy link
Contributor

overlookmotel commented Jul 17, 2024

@Dunqing Is this related to oxc-project/backlog#31?

If so, a couple of thoughts:

  1. I'd imagined counting them in parser, rather than doing a whole extra AST pass in Semantic.
  2. But even doing it in Semantic, as in this PR, if you use the collected counts to initialize ScopeTree, SymbolTable and Nodes's Vecs with sufficient capacity, it may turn out that delivers a perf boost which is greater than the loss shown at present on this PR. Resizing the Vecs is really costly. So overall it could be positive.

Copy link

graphite-app bot commented Jul 17, 2024

Your org has enabled the Graphite merge queue for merging into main

Add the label “merge” to the PR and Graphite will automatically add it to the merge queue when it’s ready to merge. Or use the label “hotfix” to add to the merge queue as a hot fix.

You must have a Graphite account and log in to Graphite in order to use the merge queue. Sign up using this link.

@Dunqing
Copy link
Member Author

Dunqing commented Jul 17, 2024

@Dunqing Is this related to oxc-project/backlog#31?

If so, a couple of thoughts:

  1. I'd imagined counting them in parser, rather than doing a whole extra AST pass in Semantic.
  2. But even doing it in Semantic, as in this PR, if you use the collected counts to initialize ScopeTree, SymbolTable and Nodes's Vecs with sufficient capacity, it may turn out that delivers a perf boost which is greater than the loss shown at present on this PR. Resizing the Vecs is really costly. So overall it could be positive.

Yes, We'd better be able to do this in parser, and can expect this to be a sizeable performance gain

@Dunqing Dunqing reopened this Jul 17, 2024
@overlookmotel
Copy link
Contributor

overlookmotel commented Jul 17, 2024

Wow! This is banging! Much more than I expected.

Only problem is semantic[RadixUIAdoptionSection.jsx]. Personally, I think that particular benchmark is quite important because it's the most representative of actual application code.

I think the reason it's suffering is:

  • The loss is the extra traversal in Collect.
  • The gain is the Vecs not growing, and avoiding the memory copying involved.
  • The cost of growing the Vecs is massive when they're big, not so much when they're small.
  • So for the large file benchmarks, the gain outweighs the loss, but for little RadixUIAdoptionSection.jsx, the loss is the stronger factor.

In my view we should not merge this as is because of the significant negative effect on that 1 benchmark. But what this experiment has done is show us there's a massive speed boost there for the taking, and that we should probably prioritize it above other perf work.

@Boshen
Copy link
Member

Boshen commented Jul 17, 2024

I'm in favor of merging this and have a new issue on removing this ast pass, which is a clearer task.

@overlookmotel
Copy link
Contributor

Does Rolldown's process involve running Semantic at all? If it does, merging this I'd imagine could hurt them, because RadixUIAdoptionSection.jsx is probably the closest of our benchmarks to real-world app code Rolldown processes.

A compromise would be to use a heuristic to enable/not enable the extra pass. e.g. only run on source files larger than 10 KB. That'd probably put us in the green for all benchmarks. What do you think of that option?

@DonIsaac
Copy link
Contributor

I've opened a similar PR tackling this in a slightly different way: #4332. Sorry I didn't know what this one was doing, I was a little confused by the title! #4332 uses source text length as a heuristic for reserving memory instead of performing a traversal. It seems to perform worse overall than @Dunqing's approach, which I find quite interesting.

@overlookmotel
Copy link
Contributor

overlookmotel commented Jul 17, 2024

Don's approach of estimating the required size of the Vecs from source text size has pluses and minuses:

  • Advantage: Avoid the overhead of the Collect pass.
  • Disadvantage: Potentially getting the estimate wrong.

He's getting even better results on some benchmarks than this PR, but worse on others.

If I can throw in my 2 cents as well...

From my (fairly little) knowledge of OS-level memory stuff, I believe that allocating too much memory is not necessarily a problem. On architectures with virtual memory (which includes all the ones Oxc targets except WASM), allocating a huge slab of memory (e.g. 1 GB) doesn't actually consume 1 GB of physical memory. It only consumes 1 GB of virtual memory space. Pages of physical memory to "back" that virtual memory will only be allocated when that chunk of the memory is first written to. Like I said, I could be wrong about this - it's not an area I really know - but that's my rough understanding.

Which is all a long way of saying: possibly the best strategy (except on WASM) is to estimate too high because that costs very little, whereas estimating too low will lead to the Vecs reallocating, which is really expensive when they're large.

The big question though is where is this -6% perf penalty on the RadixUIAdoptionSection.jsx benchmark coming from? Don's approach is also getting -6%, despite taking a different approach. So the explanation I gave in comment above is probably wrong.

I don't really have much idea on that one. A good first step might be to figure out what the optimal size of all these Vecs is for each benchmark (i.e. how many scopes/symbols/nodes are actually needed for each) and test initializing the Vecs to exactly those sizes. That'd show us the upper bound on perf improvement we can get, and then we could measure different strategies against that, and try to figure out where the -6% is coming from. @DonIsaac would you like to test that out on your PR?

@DonIsaac
Copy link
Contributor

Sure, I can try that out. Another possible option is to record symbols/AST nodes/etc that we find during the parse step. Thoughts?

@overlookmotel
Copy link
Contributor

overlookmotel commented Jul 18, 2024

Sure, I can try that out. Another possible option is to record symbols/AST nodes/etc that we find during the parse step. Thoughts?

Yes, that is a good option. Please see my comment above. But:

  1. Your PR perf(semantic): initialize builder storage with pre-allocated capacity #4332 has raised another way of doing it which I think is also worth investigating.
  2. Counting everything in the parser is going to be complicated, and we should do that as a follow-up.

Sorry, I think I've made everything too complicated! As Boshen said above, let's get this PR merged, and then iterate on it.

So I propose:

  1. Merge this PR now.
  2. Follow-up PR ASAP to add a guard which disables the Collect pass if file is below a certain size, to get the RadixUIAdoptionSection benchmark back up to where it was before (which we must merge before next Oxc release, to avoid hurting downstream consumers).
  3. Open another issue to discuss further improvement, where we can also consider your approach from perf(semantic): initialize builder storage with pre-allocated capacity #4332 and/or counting in parser.

@Dunqing, @DonIsaac Does that sound OK to you?

@Dunqing
Copy link
Member Author

Dunqing commented Jul 18, 2024

  1. Merge this PR now.
  2. Follow-up PR ASAP to add a guard which disables the Collect pass if file is below a certain size, to get the RadixUIAdoptionSection benchmark back up to where it was before (which we must merge before next Oxc release, to avoid hurting downstream consumers).
  3. Open another issue to discuss further improvement, where we can also consider your approach from perf(semantic): initialize builder storage with pre-allocated capacity #4332 and/or counting in parser.

@Dunqing, @DonIsaac Does that sound OK to you?

Sounds good to me. Both of these approaches look good. Which one you would favor, I have no opinion on that.

@Boshen
Copy link
Member

Boshen commented Jul 18, 2024

109.1 µs -> 116.6 µs

Don't forget to look at the absolute numbers, -6% is huge but 7 µs is small.

@@ -100,6 +100,39 @@ pub struct SemanticBuilderReturn<'a> {
pub errors: Vec<OxcDiagnostic>,
}

#[derive(Default)]
pub struct Collector {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should move this to another file and have some doc comments explaning the why before merge.

@Boshen
Copy link
Member

Boshen commented Jul 18, 2024

Counting everything in the parser is going to be complicated

AST builder can do the counting :-)

@overlookmotel
Copy link
Contributor

AST builder can do the counting :-)

Yes, but currently AstBuilder is Copy which we'll need to change to make it stateful, some places bypass the builder, and counting scopes is dependent on some surrounding context because some scopes are conditional. All totally doable, but not a completely trivial amount of work.

Don't forget to look at the absolute numbers, -6% is huge but 7 µs is small.

I thought % change was more important, on grounds that a large application might contain 10,000+ files this size (particularly if you include node_modules), so the perf impact will scale. Is my logic off?

@overlookmotel
Copy link
Contributor

@Dunqing I see you're still working on this, so going to leave you alone to get on with it. Please shout when you're happy with it and would like a review.

@Dunqing
Copy link
Member Author

Dunqing commented Jul 19, 2024

I corrected the issue with the Collector's incorrect count. Currently 100% accurate.

@Dunqing
Copy link
Member Author

Dunqing commented Jul 19, 2024

Maybe we can do more, even count the number of each scope bindings?

@overlookmotel
Copy link
Contributor

overlookmotel commented Jul 19, 2024

I corrected the issue with the Collector's incorrect count. Currently 100% accurate.

I assume you're only referring to nodes count being accurate here?

Is symbol count for checker.ts right now? Previously it was an under-estimate, which is a problem.

I don't think we can make symbol (aka binding) count completely accurate without building scope tree (which would defeat the point of Collector) because of var x; var x; (2 x binding identifiers, but only 1 symbol).

As I said above, I don't think we necessarily need to make the counts 100% accurate. Over-estimating is fine. It may be that the cost of complicating Collector to make an accurate count (and so making it a bit slower) is worse than the cost of allocating too much space in the Vecs. It's under-estimating that we absolutely need to avoid.

Maybe we can do more, even count the number of each scope bindings?

Yes, maybe that would be good - could size the bindings hash map for each scope correctly up front. But where do we store those "bindings per scope" counts? Would have to be in a Vec which we don't know the right size for until Collector pass has finished, so it'd resize all the time - i.e. that creates the same kind of problem again that this PR solves! Maybe there's a way around that, but it's a bit more complicated. I'd suggest leaving that to a follow-up.

@Dunqing
Copy link
Member Author

Dunqing commented Jul 19, 2024

I assume you're only referring to nodes count being accurate here?

Is symbol count for checker.ts right now? Previously it was an under-estimate, which is a problem.

I don't think we can make symbol (aka binding) count completely accurate without building scope tree (which would defeat the point of Collector) because of var x; var x; (2 x binding identifiers, but only 1 symbol).

As I said above, I don't think we necessarily need to make the counts 100% accurate. Over-estimating is fine. It may be that the cost of complicating Collector to make an accurate count (and so making it a bit slower) is worse than the cost of allocating too much space in the Vecs. It's under-estimating that we absolutely need to avoid.

I got you mean. You are right. The counting is not complicated as it stands, we are not dealing with var x, var x so comparing the actual number is an overestimate

@Dunqing
Copy link
Member Author

Dunqing commented Jul 19, 2024

Rolldown's codspeed shows 1% ~ 4% faster than before

@overlookmotel
Copy link
Contributor

I checked counts on latest version (2nd fix: stack overflow commit):

  • Radix
    • All estimated exactly.
  • antd
    • Symbols over-estimated a bit (no problem).
  • cal.com:
    • Nodes slightly over-estimated (shouldn't happen).
    • Symbols very over-estimated (no problem, expected).
    • References over-estimated a bit (shouldn't happen but not a big deal).
  • checker:
    • Nodes slightly over-estimated (shouldn't happen).
    • Symbols slightly under-estimated (problem).
    • References over-estimated a bit (shouldn't happen but not a big deal).
  • pdf.mjs
    • All estimated exactly.

The most problematic one is Collector undercounting scopes in checker.ts. That means the scopes Vecs are re-allocating during SemanticBuilder's traversal, which will be a bit of a slow-down.

@Dunqing
Copy link
Member Author

Dunqing commented Jul 19, 2024

The most problematic one is Collector undercounting scopes in checker.ts. That means the scopes Vecs are re-allocating during SemanticBuilder's traversal, which will be a bit of a slow-down.

I found nodes over-estimated caused by visit_arrow_function missing visit return_type.

When #4366 is merged. Only symbols are very over-estimated.

[crates/oxc_semantic/src/builder.rs:289:13] collector = Collector {
    node: 314759,
    scope: 10554,
    symbol: 15834,
    reference: 78443,
}
[crates/oxc_semantic/src/builder.rs:292:13] Collector {
    node: self.nodes.len(),
    scope: self.scope.len(),
    symbol: self.symbols.len(),
    reference: self.symbols.references.len(),
} = Collector {
    node: 314759,
    scope: 10554,
    symbol: 15674,
    reference: 78443,
}
[Finished running. Exit status: 0]

Boshen pushed a commit that referenced this pull request Jul 19, 2024
@Dunqing Dunqing merged commit 8a2971b into main Jul 19, 2024
22 of 26 checks passed
@Dunqing Dunqing deleted the test-visit branch July 19, 2024 15:27
@Dunqing Dunqing restored the test-visit branch July 19, 2024 15:27
@Dunqing
Copy link
Member Author

Dunqing commented Jul 19, 2024

I don’t why this PR is merged !!!! I didn't click the merge button. And I don't see a revert button

@Dunqing
Copy link
Member Author

Dunqing commented Jul 19, 2024

Reverted, reopen in #4367

@overlookmotel
Copy link
Contributor

I don’t why this PR is merged !!!! I don't click the merge button. And I don't see a revert button

LOLs! Github loves your perf work, it couldn't resist merging.

kdy1 added a commit to swc-project/swc that referenced this pull request Jul 20, 2024
@milesj
Copy link
Contributor

milesj commented Jul 20, 2024

Y'all are CRACKED

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-semantic Area - Semantic
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants