Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve parsing performance #152

Merged
merged 14 commits into from
Oct 29, 2024
Merged

Improve parsing performance #152

merged 14 commits into from
Oct 29, 2024

Conversation

JLHwung
Copy link
Collaborator

@JLHwung JLHwung commented Oct 28, 2024

In this PR we improve the general parsing performance.

  • Improve the match helper: Previously we compared the index with the current position, which could lead to quadratic time as we are scanning the full input. In this PR we switched to a faster implementation, we also introduced matchOne helper so that we can skip the string prototype substring when we are only matching one character
  • Improve hot parse* paths: Previously we issued multiple matchReg calls when parsing atoms. They are now replaced by a big switch case that depends on the current character
  • Ensure most AST nodes are monomorphic: Previously we added the .raw property to the finished AST node, V8 will have to migrate the AST shape to a new shape with .raw property, the addRaw function are now inline to AST creation so that V8 can generate the final AST node directly. Currently, the normal group and ignore group will be polymorphic if we have named groups / modifiers. However both of them are new features, we can improve that later

Added a new npm run bench script to show the benchmark result: the baseline version is the last release and the current version is this PR. The benchmark samples are taken from multiple public npm libraries.

Benchmark result
┌─────────┬──────────────────────────────────┬─────────────┬────────────────────┬──────────┬─────────┐
│ (index) │ Task Name                        │ ops/sec     │ Average Time (ns)  │ Margin   │ Samples │
├─────────┼──────────────────────────────────┼─────────────┼────────────────────┼──────────┼─────────┤
│ 0       │ 'parse current ansi-regex'       │ '76,428'    │ 13084.105168127518 │ '±0.74%' │ 38215   │
│ 1       │ 'parse current astral-regex'     │ '1,174,586' │ 851.3635538589124  │ '±0.83%' │ 587294  │
│ 2       │ 'parse current doi-regex'        │ '312,925'   │ 3195.650786448047  │ '±0.75%' │ 156463  │
│ 3       │ 'parse current email-regex'      │ '283,048'   │ 3532.96158275956   │ '±0.45%' │ 141525  │
│ 4       │ 'parse current emoji-regex'      │ '2,035'     │ 491214.6748526555  │ '±0.84%' │ 1018    │
│ 5       │ 'parse current ip-regex'         │ '15,041'    │ 66481.76785001896  │ '±1.27%' │ 7521    │
│ 6       │ 'parse current identifier-regex' │ '536,451'   │ 1864.1025031130673 │ '±0.58%' │ 268226  │
│ 7       │ 'parse current issue-regex'      │ '151,317'   │ 6608.6141106799705 │ '±0.56%' │ 75659   │
│ 8       │ 'parse current mapcode-regex'    │ '1,120'     │ 892219.0980392332  │ '±1.25%' │ 561     │
│ 9       │ 'parse current scoped-regex'     │ '539,371'   │ 1854.0085766416416 │ '±0.56%' │ 269686  │
│ 10      │ 'parse current semver-regex'     │ '85,675'    │ 11671.941874036664 │ '±0.60%' │ 42838   │
│ 11      │ 'parse current shebang-regex'    │ '1,337,546' │ 747.6376279576044  │ '±1.12%' │ 668774  │
└─────────┴──────────────────────────────────┴─────────────┴────────────────────┴──────────┴─────────┘
┌─────────┬───────────────────────────────────┬───────────┬────────────────────┬──────────┬─────────┐
│ (index) │ Task Name                         │ ops/sec   │ Average Time (ns)  │ Margin   │ Samples │
├─────────┼───────────────────────────────────┼───────────┼────────────────────┼──────────┼─────────┤
│ 0       │ 'parse baseline ansi-regex'       │ '28,886'  │ 34617.794932151766 │ '±0.72%' │ 14444   │
│ 1       │ 'parse baseline astral-regex'     │ '535,407' │ 1867.7372097540228 │ '±0.79%' │ 267704  │
│ 2       │ 'parse baseline doi-regex'        │ '123,750' │ 8080.746961664696  │ '±0.49%' │ 61876   │
│ 3       │ 'parse baseline email-regex'      │ '95,653'  │ 10454.389947101788 │ '±0.55%' │ 47827   │
│ 4       │ 'parse baseline emoji-regex'      │ '44'      │ 22395461.91304352  │ '±0.70%' │ 23      │
│ 5       │ 'parse baseline ip-regex'         │ '1,357'   │ 736893.8217967617  │ '±0.45%' │ 679     │
│ 6       │ 'parse baseline identifier-regex' │ '269,357' │ 3712.5437521808644 │ '±0.50%' │ 134679  │
│ 7       │ 'parse baseline issue-regex'      │ '55,087'  │ 18152.787322104607 │ '±0.72%' │ 27544   │
│ 8       │ 'parse baseline mapcode-regex'    │ '42'      │ 23419407.136363797 │ '±0.84%' │ 22      │
│ 9       │ 'parse baseline scoped-regex'     │ '195,864' │ 5105.566336168437  │ '±0.24%' │ 97933   │
│ 10      │ 'parse baseline semver-regex'     │ '21,068'  │ 47463.5307071642   │ '±0.65%' │ 10535   │
│ 11      │ 'parse baseline shebang-regex'    │ '582,227' │ 1717.5412278359452 │ '±0.56%' │ 291114  │
└─────────┴───────────────────────────────────┴───────────┴────────────────────┴──────────┴─────────┘

If a match is not found immediately starting from pos, match will exit early instead of scaning till the end of the source string.
This avoids adding new property (.raw) to AST node after the object is created, so the AST node is monomorphic and V8 don't have to transition to a new shape.
Also avoid mutating the return shape of string.match since the range info can be read from pos
Also introduce `consume` for previous `incr` utility. In most cases, the `incr` result is not used and the unused string slice should be avoided.
replace multiple regex matching with a big switch case.
replace multiple regex matching with a big switch case.
so that we can avoid the string protototype slice call when possible
@JLHwung JLHwung requested a review from jviereck October 29, 2024 14:17
@jviereck jviereck merged commit 2fbb816 into jviereck:gh-pages Oct 29, 2024
@jviereck
Copy link
Owner

This PR makes me very happy. Thanks @JLHwung for working on this!

@jviereck
Copy link
Owner

I will make a new release tonight.

@JLHwung JLHwung deleted the perf branch October 29, 2024 15:44
@jviereck
Copy link
Owner

Published v0.12.0 which includes this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants