feat: introduce experimental JavaScript RegExp Engine #761
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR introduces an experimental engine using JavaScript's native RegExp without the Oniguruma Wasm binary. This makes Shiki run entirely with pure JavaScript.
Approach
Currently, the wasm binary Oniguruma we are relying on is actually a RegExp engine written in C. It's very powerful and supports many extensive syntax and features that JavaScript does not support. As JavaScript evolves, now the modern JavaScript got many missing features like regex lookahead etc. So the idea came to whether we could leverage the RegExp engine shipped in the language instead of porting another.
Then I end up with https://github.com/antfu/oniguruma-to-js, a library to convert Oniguruma features down to the syntaxes that JavaScript RegExp could understand. Think of Babel that transpile ESNext to ES5.
It turns out that the feature parity isn't that far and mostly syntax difference. With that, we get the ~40% of Shiki languages work perfectly with the JS engine, most of the others are supported partially, while only 2 languages that will fail at this moment.
Compactiblity
Currently, the result is
Full report: https://github.com/shikijs/shiki/blob/feat/engine-lite/scripts/report-engine-js-compat.md
Benchmark
With the early benchmarking, it indicates the JavaScript engine is actually 1.7x faster than WASM with
shiki.codeToTokensBase()
Benchmark is run against the 84 fully supported languages by both engines.
Usage
Currently the usage is like:
In the future, we might need to do a completely redesign (break changes) to decouple the WASM onig engine so it can be bundled more composablely.