Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using the JS engine by default #870

Closed
slevithan opened this issue Dec 19, 2024 · 6 comments
Closed

Using the JS engine by default #870

slevithan opened this issue Dec 19, 2024 · 6 comments

Comments

@slevithan
Copy link
Collaborator

slevithan commented Dec 19, 2024

The latest version of Shiki's JS engine is robust, handling 95% of grammars (99.9% of regexes). The next version of oniguruma-to-es will also move Markdown and PO to supported (at minimum, since they're already fully working on main). Ada (maybe also Hack) will move to supported after a single invalid Oniguruma regex is fixed upstream [1], [2]. And I've also identified a complex subroutine edge case (Oniguruma subroutines are horrifyingly complex to emulate if you want to cover 100% of edge cases) causing mismatches for Apex. Although it's a really tricky one, I'm optimistic that I'll figure out how to emulate it (with a fair amount of work and special handling in oniguruma-to-es's RegExp subclass).

Not counting other potential improvements (and grammars that might come along for the ride with the Apex fix), supporting the grammars mentioned above would mean 97% of grammars are supported. For people who understand the full depth of Oniguruma's advanced features and crazy edge cases (and JavaScript's limitations), I think this is epic. 😊

Given that the JS engine is about 4% of the size of the WASM engine and roughly performance-neutral (some grammars are faster, some slower), I'm curious what it would take to remove the experimental label and/or make it the default engine.

It's certainly possible to keep grinding away at improving the underlying oniguruma-to-es library to add even more support (help would be very welcome). However, it would probably be easier to modify patterns in the remaining unsupported / mismatched grammars to avoid problematic features (like overlapping recursions and some uses of \G). That approach would also probably allow more people to help.

What do you think? Is moving to the JS engine as the default the right goal?

@antfu
Copy link
Member

antfu commented Dec 20, 2024

Yeah I think it's definitely ready to mark as stable, thanks to your hard work! I am still not sure about to have it as the default yet (if we cover 100% then I'd be totally up to it, we could also remove those really-hard-to-support grammars, or change to sources that are more compatible).

My goal is definitely to have Shiki to not coupled with Oniguruma. The separation is almost done, but it's still be bundled in the main entry, that I planned to make them composable/exchangable in the next major.

@antfu
Copy link
Member

antfu commented Dec 20, 2024

In terms of improving the grammars, do you think we should provide a better toolset for authoring grammars? We can have starter template with a playground that can inspect the state machine and tokens, also would tell if you are using some Oniguruma-specific feature and whether they can be polyfilled by oniguruma-to-es etc. This would also make it easier for us to port some grammars for better compatibility/performance.

@slevithan
Copy link
Collaborator Author

slevithan commented Dec 20, 2024

Yeah I think it's definitely ready to mark as stable

Cool. I'll make related documentation changes alongside the next oniguruma-to-es version bump.

we could also remove those really-hard-to-support grammars, or change to sources that are more compatible

Nice to have that as an option.

My goal is definitely to have Shiki to not coupled with Oniguruma. The separation is almost done, but it's still be bundled in the main entry, that I planned to make them composable/exchangable in the next major.

Aside: Would it also be possible to upgrade vscode-oniguruma in the next major version? Not sure what you'd have to do for that. Oniguruma cares about backcompat, so the underlying upgrades and bug fixes for regex syntax and behavior most likely won't change support for any grammars. oniguruma-to-es is intended to emulate the latest Oniguruma version (I generally test using vscode-oniguruma 2.0.1), so although I'm not currently aware of any discrepancies affecting TM grammars (apart from different levels of Unicode support), it will be nice to not have to worry about this. vscode-oniguruma 2.0.1 includes Oniguruma 6.9.8, which is one patch removed from the latest version.

In terms of improving the grammars, do you think we should provide a better toolset for authoring grammars? We can have starter template with a playground that can inspect the state machine and tokens, also would tell if you are using some Oniguruma-specific feature and whether they can be polyfilled by oniguruma-to-es etc. This would also make it easier for us to port some grammars for better compatibility/performance.

I think it's a great idea. TM grammar authors need all the help they can get, and providing a toolset like that might help ensure future grammars are compatible with Shiki (including its JS engine). It's probably not something I'd be able to take on, though, apart from helping with oniguruma-to-es related integration and suggestions. (First suggestion 😊: It would be nice to warn when a regex can be emulated with option allowUnhandledGAnchors but not without it, so those regexes can be targeted for refactoring.)

@antfu
Copy link
Member

antfu commented Dec 21, 2024

Would it also be possible to upgrade vscode-oniguruma in the next major version?

Is there any feature/fix missing with the current version?

I see 2.0.1 (which was over one year ago) only changes the wasm from vendoring to building in CI (microsoft/vscode-oniguruma@e51f3e9) - where the new build changes the binding so it would require extra work to adopt. But I see not much to gain for using the new version

@slevithan
Copy link
Collaborator Author

Oh, you're right! vscode-oniguruma v1.7.0 is the version that bumped Oniguruma to v6.9.8 (microsoft/vscode-oniguruma@ef79edf). So v1.7.0 and v2.0.1 are using the same version of Oniguruma. In that case, no worries! Since v1.7.0's onig.wasm is slightly smaller (456KB) than in v2.0.1 (463KB), I'd incorrectly assumed it was an earlier version of Oniguruma.

oniguruma-to-es's parsing/emulation includes two bug fixes from Oniguruma 6.9.9, but they're very minor and are not affecting the JS engine compat report. And in any case v6.9.9 is not yet available via any version of vscode-oniguruma.

@slevithan
Copy link
Collaborator Author

Closing this since you removed the "experimental" label in 25695aa (released in v1.24.4).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants