Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regex for entity.function.name.lambda.kusto triggers an Oniguruma bug #209

Closed
slevithan opened this issue Dec 27, 2024 · 2 comments · Fixed by #210
Closed

Regex for entity.function.name.lambda.kusto triggers an Oniguruma bug #209

slevithan opened this issue Dec 27, 2024 · 2 comments · Fixed by #210
Labels
bug Something isn't working help wanted Contribution requested

Comments

@slevithan
Copy link
Contributor

The regex on this line ("(?<=let ).+(?=\\W*=)") triggers an Oniguruma bug that prevents it from ever matching.

Specifically, a lookbehind immediately followed by a greedily-quantified . or \N (which are handled the same when not using Oniguruma's m flag) prevent the regex from matching. So, for example, if we test it against the string "let dt = datetime(2017-01-29 09:00:05);" (from the Kusto sample used by Shiki), it does not match, although it should. This prevents the dt in this example string from being highlighted correctly.

Additionally, since Shiki's JS engine doesn't reproduce this Oniguruma bug, it leads to Shiki listing Kusto as mismatched (unsupported) in its JS engine, since the JS engine doesn't match Oniguruma's results (even though the JS engine is in fact highlighting it correctly).

More details about the bug

We can create a reduced test case for the bug with (?<=.).+ or (?<=.)\N+. These do not find a match within aa. This bug is very specific to a greedily-quantified . or \N following a lookbehind. If we replace the reduced regex with any of the following, the bug is not triggered and the regex finds a match:

  • Exactly equivalent:
    • (?<=.)[^\n]+ -- A dot excludes only \n in Oniguruma.
  • Equivalent in TM grammars (since regexes search line-by-line) even though they replace the dot with a token that matches any char including \n:
    • (?<=.)[\0-\x{10FFFF}]+
    • (?<=.)\O+ -- Uppercase letter O (not zero).
    • (?<=.)(?m:.+)
  • Not equivalent, but shows that even slight changes avoid the bug:
    • (?<=.).+?
    • (?<=.).

Fix

I'd recommend changing the regex to "(?<=let )[^\\n]+(?=\\W*=)" or "(?<=let )\\O+(?=\\W*=)". Either of these will fix it.

I'd be happy to submit a PR if that would help.

Context

Apart from when using Shiki's JS engine (which emulates Oniguruma), regexes in TextMate grammars are run in the Oniguruma regex engine. VS Code, Shiki, etc. use vscode-oniguruma for this. vscode-oniguruma v1.7.0 up to the latest version are running Oniguruma 6.9.8.

CC @antfu

@rosshamish
Copy link
Owner

Hi Steven, thanks for the detailed report! Looks like an interesting project.

I'd be happy to accept a PR contribution for this. Either of the fixes you've suggested sound reasonable to me. Contributing in this repo is pretty straightforward, and there's a snapshot-style test suite in which you could add a new regression test to cover this bug. Instructions here: https://github.com/rosshamish/kuskus/blob/master/kusto-syntax-highlighting/CONTRIBUTING.md

@rosshamish rosshamish added bug Something isn't working help wanted Contribution requested labels Dec 27, 2024
@rosshamish
Copy link
Owner

Fix is published as v2.0.5

https://github.com/rosshamish/kuskus/actions/runs/12522781064

Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Contribution requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants