Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: improve snapshot testing for js engine #866

Merged
merged 9 commits into from
Dec 17, 2024
Merged

Conversation

slevithan
Copy link
Collaborator

There are still some mismatches in the JS engine I can't explain.

Beancount is a good grammar to investigate because it has a relatively small number of patterns and doesn't use any pattern features with expected potential for mismatches, like unsupported uses of \G.

Following are the current grammars labeled as mismatched (no parsing errors that move them to unsupported) when oniguruma-to-es option allowUnhandledGAnchors is set to false:

  • apex
  • beancount
  • kotlin
  • kusto

I think if we identify what's wrong with beancount we might fix all of these, plus bring down the "diff" count on some other grammars that are currently unsupported.

However, if just looking directly at the JS engine's whole results for the beancount grammar via compare.test.js, although it's hard for me to interpret, it looks like there might not be any pattern match differences, but rather maybe the JS engine itself is working slightly differently than the Oniguruma engine. So it might be something that requires adjusting/fixing in the JS engine code rather than in oniguruma-to-es.

@antfu asked me to send a draft PR to help investigate.

Note: I've removed the file system alias for oniguruma-to-es since I'm not sure it’s not doing anything to help. You can just change packages/engine-javascript/package.json to use "oniguruma-to-es": "file:../../../oniguruma-to-es", during development/testing.

Copy link

netlify bot commented Dec 16, 2024

Deploy Preview for shiki-matsu ready!

Name Link
🔨 Latest commit f1505c4
🔍 Latest deploy log https://app.netlify.com/sites/shiki-matsu/deploys/6761822fc18d5c00087f6e4a
😎 Deploy Preview https://deploy-preview-866--shiki-matsu.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Copy link

netlify bot commented Dec 16, 2024

Deploy Preview for shiki-next ready!

Name Link
🔨 Latest commit f1505c4
🔍 Latest deploy log https://app.netlify.com/sites/shiki-next/deploys/6761822fd0a30b0008698ed0
😎 Deploy Preview https://deploy-preview-866--shiki-next.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@antfu antfu marked this pull request as draft December 17, 2024 06:32
@antfu antfu self-requested a review December 17, 2024 06:32
@antfu
Copy link
Member

antfu commented Dec 17, 2024

Yeah, that's interesting; it's weird that it seems that in the first pass, the results are exactly the same, but somehow, the second pass has different args. I do not yet have a clue why this would happen

@antfu
Copy link
Member

antfu commented Dec 17, 2024

Oh, I found something; the previous format of the records wasn't linear (it was grouped by regex patterns). I updated the format to be linear, and on the second match of beancount there is a mismatch:

CleanShot 2024-12-17 at 17 42 17@2x

You can reproduce it with the latest commit with pnpm test verify

Comment on lines 35 to 36
if (!(i === 2 && name === 'beancount'))
continue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment this out to run the full suite.

@slevithan
Copy link
Collaborator Author

slevithan commented Dec 17, 2024

With the help of your latest changes, I found the problem. 😖 Turns out Oniguruma's ^ cannot match after a string-terminating \n (though it matches after every other \n, including the position sandwiched between \n\n). There is no equivalent restriction on $.

In other words, oniguruma-to-es needs to change its conversion for ^ from (?<=^|\n) to (?<=^|\n(?!$)).

This change will be made in the next version of oniguruma-to-es. Compared to the current v0.7.0, just making this change moves beancount, php, pug, and rst to the supported list, and it significantly lowers the diff count for markdown, mdc, and po. It does nothing to help apex, kotlin, or kusto (which I mentioned in my first comment for this PR).

I'd recommend landing all of these changes (after removing if (!(i === 2 && name === 'beancount')) from verify.test.ts, and perhaps after removing the beancount test from compare.test.ts and the files that generates, if you prefer) since they are helpful cleanup and make it easier to compare other grammars. I'll then follow up with a PR that upgrades oniguruma-to-es to fix ^.

@antfu antfu changed the title draft: investigating source of mismatches in js engine test: improve snapshot testing for js engine Dec 17, 2024
@antfu antfu marked this pull request as ready for review December 17, 2024 13:53
Copy link

codecov bot commented Dec 17, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.21%. Comparing base (6d4b81c) to head (f1505c4).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #866      +/-   ##
==========================================
+ Coverage   95.18%   95.21%   +0.02%     
==========================================
  Files          83       83              
  Lines        7057     7057              
  Branches     1461     1460       -1     
==========================================
+ Hits         6717     6719       +2     
+ Misses        332      330       -2     
  Partials        8        8              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@antfu antfu merged commit c489859 into shikijs:main Dec 17, 2024
13 of 14 checks passed
@slevithan slevithan deleted the jsengine branch December 17, 2024 17:22
@slevithan
Copy link
Collaborator Author

slevithan commented Dec 17, 2024

Heads up that the next version of oniguruma-to-es will take some time because, using compare.test.ts, I've discovered another major culprit for remaining mismatches that I want to fix before release.

It turns out that vscode-oniguruma enables Oniguruma option ONIG_OPTION_CAPTURE_GROUP whenever options aren't manually provided (!) (see here), which changes behavior in various ways when mixing named and unnamed capture. oniguruma-to-es doesn't currently support this option, and although I've now started work on it, it requires deep changes for things like subroutine and recursion handling. The good news is that adding support for this (and enabling the option in the JS engine) will dramatically reduce diff counts for remaining mismatched/unsupported grammars, and move at least one (confirmed so far) to supported.

Aside: There are also open issues here, here, and here that, if resolved, would further improve the JS engine's numbers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants