Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Use Oniguruma-To-ES in the JS engine (#828) #832

Merged
merged 10 commits into from
Nov 15, 2024

Conversation

slevithan
Copy link
Collaborator

Description

See #828. This changes the JS engine to use Oniguruma-To-ES for much improved emulation (beyond what the numbers say) and sets a new foundation to continue improving grammar support in the JS engine.

Additional context

I don't really understand what's going on with the packages/engine-javascript/test tests (which I didn't modify in this PR except for deleting utils.test.ts since it was testing a now-deleted file), so it's possible some stuff in there should also be deleted or modified after this lands. pnpm test passes.

The simulation option in packages/engine-javascript/src/index.ts is probably not appropriately named anymore now that most of its behavior has been removed. I've updated the comment for it but didn't change the name (will leave for @antfu).

Copy link

netlify bot commented Nov 10, 2024

Deploy Preview for shiki-matsu ready!

Name Link
🔨 Latest commit 8ae1608
🔍 Latest deploy log https://app.netlify.com/sites/shiki-matsu/deploys/6736dc8cc497b00008c30160
😎 Deploy Preview https://deploy-preview-832--shiki-matsu.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Copy link

netlify bot commented Nov 10, 2024

Deploy Preview for shiki-next ready!

Name Link
🔨 Latest commit 8ae1608
🔍 Latest deploy log https://app.netlify.com/sites/shiki-next/deploys/6736dc8c09a4db0008ff5290
😎 Deploy Preview https://deploy-preview-832--shiki-next.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Copy link

codecov bot commented Nov 10, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 94.91%. Comparing base (93246cd) to head (9fc2b85).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #832      +/-   ##
==========================================
+ Coverage   94.86%   94.91%   +0.04%     
==========================================
  Files          78       77       -1     
  Lines        6608     6583      -25     
  Branches     1319     1314       -5     
==========================================
- Hits         6269     6248      -21     
+ Misses        330      326       -4     
  Partials        9        9              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@slevithan
Copy link
Collaborator Author

@antfu Would it be possible for you to modify or take over this PR to fix the tests?

  • GitHub Actions / test (18.x, ubuntu-latest) has multiple failures with "SyntaxError: Invalid flags supplied to RegExp constructor 'dgv'".
    • This is expected because flag v requires Node.js 20+. Maybe the tests could be made to run with Oniguruma-To-ES's target set to 'ES2018' instead of 'ES2024'. That would allow it to work on Node.js 10+.
    • IMO it will be best to have the JS engine report still run with target: 'ES2024'.
  • packages/engine-javascript/test/compare.test.js has multiple failures, but I think these tests need to be regenerated or deleted (not sure which).

// Later we might consider to bundle them.
'oniguruma-to-js',
'regex',
'oniguruma-to-es',
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure whether oniguruma-to-es's dependencies should also be externalized. I removed the comment about considering bundling because I'd much prefer not to do that if it means these projects will lose much of their download stats (which helps add to their credibility which is really valuable for now, especially while they're new projects).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally! If oniguruma-to-es gets externalized, then its dependencies will be as well.

@antfu
Copy link
Member

antfu commented Nov 14, 2024

What do you suggest for the better name of simulation?

@antfu
Copy link
Member

antfu commented Nov 14, 2024

Interestingly the regex behave different on macOS and Windows/Linux 😇

@slevithan
Copy link
Collaborator Author

slevithan commented Nov 14, 2024

What do you suggest for the better name of simulation?

I would probably just remove the option (but keep its handling). I'd probably also remove the option to provide a different JS regex constructor, since that might be unrealistic for people to do. But I might not be thinking about it the same way as you.

If you don't remove the option, maybe keep the same name at least for now (I can't think of a better one). I'm thinking of bringing back some \G anchor simulation/handling in a different way after this PR lands, based on #828 (comment). vscode-textmate's special/hacky handling of \G anchors is definitely causing problems for Oniguruma-To-ES that result in it having more mismatches than it should, which I think can be improved.

Aside: vscode-textmate also does special/hacky things with numbered backreferences in end patterns, since TextMate grammars allow using backreferences in end patterns that refer to captures in begin patterns (even though backreferences to undefined groups are errors in both Oniguruma and JS with flag u/v). This wasn't a problem with the current engine because in most cases oniguruma-to-js used JS's RegExp legacy mode without flag u or v, so using e.g. \1 in a regex when there were no capturing groups defined within the same pattern didn't error, and the \1 was treated as an octal escape by JS (equivalent to \x01), and then it was replaced by vscode-textmate with the substring matched by capture 1 from the begin pattern. But since Oniguruma-To-ES doesn't use JS's legacy Unicode-unaware mode (thereby improving a variety of things), this was initially a problem (during earlier development) since many end patterns were erroring. I solved this with what I think was a clever hack, which is the reason Oniguruma-To-ES's tmGrammar option exists.

Interestingly the regex behave different on macOS and Windows/Linux 😇

What?? 😖 In which JS environment? With Safari? I don't have access to macOS at the moment, but if you can identify specific regexes that are behaving differently, maybe I can identify JS engine bugs and work around them.

@antfu
Copy link
Member

antfu commented Nov 14, 2024

provide a different JS regex constructor

I think it would still be valuable, people could copy our implementation of using Oniguruma-To-ES and make some changes like the options, or do a preprocess/postprocess, or have a legacy implementation using oniguruma-to-js for legacy env that doesn't support v flag, etc.

In which JS environment?

Node.js, see the CI. I am using macOS, and I updated the snapshot in a commit, which result the snapshot mismatch on Linux and Windows. If you update it, it would be another way around. I haven't dig into which regex is causing that yet tho

@slevithan
Copy link
Collaborator Author

I think it would still be valuable, people could copy our implementation of using Oniguruma-To-ES and make some changes like the options, or do a preprocess/postprocess, or have a legacy implementation using oniguruma-to-js

That's all reasonable. I won't push on this.

for legacy env that doesn't support v flag

IMO this is the strongest use case, but we can make Shiki's JS engine support this out of the box by accepting a target option, with the default being 'ES2024', and then providing that to Oniguruma-To-ES. If people set target to ES2018, Oniguruma-To-ES would use flag u which is supported super broadly (only a handful of grammars would lose support).

Node.js, see the CI. I am using macOS, and I updated the snapshot in a commit, which result the snapshot mismatch on Linux and Windows. If you update it, it would be another way around. I haven't dig into which regex is causing that yet tho

Whoa. That's really tricky if the regex engine in the same version of Node.js is behaving differently on macOS and Linux/Windows.

But yeah, if you're able to identify particular regexes causing it, I'll do everything I can to try to work around it.

@slevithan
Copy link
Collaborator Author

BTW, I've got a breakthrough coming in the next version (in the works). The number of supported grammars will go way up. 😊

*
* @default 'auto'
*/
target?: 'ES2024' | 'ES2025' | 'ES2018' | 'auto'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented a auto target here, do you think it's worth to bringing it to oniguruma-to-es?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes; great idea. And after I move target: 'auto' into oniguruma-to-es, it will also be able to use target 'ES2025' when supported, making results even better.

@antfu antfu merged commit 33b8b49 into shikijs:main Nov 15, 2024
12 checks passed
@slevithan slevithan deleted the oniguruma-to-es branch November 15, 2024 08:22
weshaggard added a commit to Azure/azure-sdk-for-js that referenced this pull request Nov 15, 2024
A recent package update broke the tool so we are adding a lock file to
ensure we have a reliable execution.

For details the newer version of
https://www.npmjs.com/package/shiki?activeTab=versions version 1.23.0
introduced a new dependency in shikijs/shiki#832
which no longer compiles.

```
node_modules/oniguruma-to-es/types/parse.d.ts(61,49): error TS2339: Property 'orphan' does not exist on type '{}'.
node_modules/oniguruma-to-es/types/parse.d.ts(96,31): error TS2339: Property 'atomic' does not exist on type '{}'.
node_modules/oniguruma-to-es/types/parse.d.ts(96,39): error TS2339: Property 'flags' does not exist on type '{}'.
```

Until we figure out what the best way to fix those issues we are pinning
to the older version using a lock file.
@slevithan
Copy link
Collaborator Author

Follow up PRs to further improve grammar support, etc. in #836, #842, #847, #850.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants