Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Use Oniguruma-To-ES in the JS engine (#828) #832

Merged
merged 10 commits into from
Nov 15, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 7 additions & 7 deletions docs/guide/regex-engines.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ outline: deep

# RegExp Engines

TextMate grammars is based on regular expressions to match tokens. Usually, we use [Oniguruma](https://github.com/kkos/oniguruma) (a regular expression engine written in C) to parse the grammar. To make it work in JavaScript, we compile Oniguruma to WebAssembly to run in the browser or Node.js.
TextMate grammars are based on regular expressions to match tokens. Usually, we use [Oniguruma](https://github.com/kkos/oniguruma) (a regular expression engine written in C) to parse the grammar. To make it work in JavaScript, we compile Oniguruma to WebAssembly to run in the browser or Node.js.

Since v1.15, we expose the ability to for users to switch the RegExp engine and provide custom implementations.

Expand All @@ -20,7 +20,7 @@ const shiki = await createShiki({
})
```

Shiki come with two built-in engines:
Shiki comes with two built-in engines:

## Oniguruma Engine

Expand All @@ -43,7 +43,7 @@ const shiki = await createShiki({
This feature is experimental and may change without following semver.
:::

This experimental engine uses JavaScript's native RegExp. As TextMate grammars' regular expressions are in Oniguruma flavor that might contains syntaxes that are not supported by JavaScript's RegExp, we use [`oniguruma-to-js`](https://github.com/antfu/oniguruma-to-js) to lowering the syntaxes and try to make them compatible with JavaScript's RegExp.
This engine uses JavaScript's native RegExp. As regular expressions used by TextMate grammars are written for Oniguruma, they might contain syntax that is not supported by JavaScript's RegExp, or expect different behavior for the same syntax. So we use [Oniguruma-To-ES](https://github.com/slevithan/oniguruma-to-es) to transpile Oniguruma patterns to native JavaScript RegExp.

```ts {2,4,9}
import { createHighlighter } from 'shiki'
Expand All @@ -60,17 +60,17 @@ const shiki = await createHighlighter({
const html = shiki.codeToHtml('const a = 1', { lang: 'javascript', theme: 'nord' })
```

Please check the [compatibility table](/references/engine-js-compat) to check the support status of the languages you are using.
Please check the [compatibility table](/references/engine-js-compat) for the support status of the languages you are using.

If mismatches are acceptable and you want it to get results whatever it can, you can enable the `forgiving` option to suppress any errors happened during the conversion:
If mismatches are acceptable and you want best-effort results whenever possible, you can enable the `forgiving` option to suppress any errors that happened during the conversion:

```ts
const jsEngine = createJavaScriptRegexEngine({ forgiving: true })
// ...use the engine
```

::: info
If you runs Shiki on Node.js (or at build time), we still recommend using the Oniguruma engine for the best result, as most of the time bundle size or WebAssembly support is not a concern.
If you run Shiki on Node.js (or at build time) and bundle size or WebAssembly support is not a concern, we still recommend using the Oniguruma engine for the best result.

The JavaScript engine is more suitable for running in the browser in some cases that you want to control the bundle size.
The JavaScript engine is best when running in the browser and in cases when you want to control the bundle size.
:::
80 changes: 40 additions & 40 deletions docs/references/engine-js-compat.md

Large diffs are not rendered by default.

6 changes: 2 additions & 4 deletions packages/core/rollup.config.mjs
Original file line number Diff line number Diff line change
Expand Up @@ -36,11 +36,9 @@ const external = [
'hast',
'@shikijs/vscode-textmate',

// Externalize them to make it easier to patch and experiments
// Externalize to make it easier to patch and experiment
// Versions are pinned to avoid regressions
// Later we might consider to bundle them.
'oniguruma-to-js',
'regex',
'oniguruma-to-es',
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure whether oniguruma-to-es's dependencies should also be externalized. I removed the comment about considering bundling because I'd much prefer not to do that if it means these projects will lose much of their download stats (which helps add to their credibility which is really valuable for now, especially while they're new projects).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally! If oniguruma-to-es gets externalized, then its dependencies will be as well.

]

export default defineConfig([
Expand Down
2 changes: 1 addition & 1 deletion packages/engine-javascript/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# @shikijs/engine-javascript

Engine for Shiki using JavaScript's native RegExp (experimental).
Engine for Shiki using JavaScript's native RegExp (experimental). Uses [Oniguruma-To-ES](https://github.com/slevithan/oniguruma-to-es) to transpile regex syntax and behavior.

[Documentation](https://shiki.style/guide/regex-engines)

Expand Down
2 changes: 1 addition & 1 deletion packages/engine-javascript/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,6 @@
"dependencies": {
"@shikijs/types": "workspace:*",
"@shikijs/vscode-textmate": "catalog:",
"oniguruma-to-js": "catalog:"
"oniguruma-to-es": "catalog:"
}
}
48 changes: 0 additions & 48 deletions packages/engine-javascript/scripts/generate.ts

This file was deleted.

20 changes: 0 additions & 20 deletions packages/engine-javascript/scripts/utils.ts

This file was deleted.

51 changes: 17 additions & 34 deletions packages/engine-javascript/src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,7 @@
RegexEngineString,
} from '@shikijs/types'
import type { IOnigMatch } from '@shikijs/vscode-textmate'
import { onigurumaToRegexp } from 'oniguruma-to-js'
import { replacements } from './replacements'
import { toRegExp } from 'oniguruma-to-es'

export interface JavaScriptRegexEngineOptions {
/**
Expand All @@ -16,7 +15,7 @@
forgiving?: boolean

/**
* Use JavaScript to simulate some unsupported regex features.
* Cleanup some grammar patterns before use.
*
* @default true
*/
Expand All @@ -30,7 +29,7 @@
/**
* Custom pattern to RegExp constructor.
*
* By default `oniguruma-to-js` is used.
* By default `oniguruma-to-es` is used.
*/
regexConstructor?: (pattern: string) => RegExp
}
Expand All @@ -41,18 +40,19 @@
* The default RegExp constructor for JavaScript regex engine.
*/
export function defaultJavaScriptRegexConstructor(pattern: string): RegExp {
return onigurumaToRegexp(
return toRegExp(

Check failure on line 43 in packages/engine-javascript/src/index.ts

View workflow job for this annotation

GitHub Actions / test (18.x, ubuntu-latest)

packages/engine-javascript/test/compare.test.ts > cases > json-basic

SyntaxError: Invalid flags supplied to RegExp constructor 'dgv' ❯ Module.toRegExp node_modules/.pnpm/oniguruma-to-es@0.1.2/node_modules/oniguruma-to-es/src/index.js:109:10 ❯ defaultJavaScriptRegexConstructor packages/engine-javascript/src/index.ts:43:10 ❯ packages/engine-javascript/src/index.ts:90:23 ❯ new JavaScriptScanner packages/engine-javascript/src/index.ts:68:29 ❯ Object.createScanner packages/engine-javascript/src/index.ts:187:14 ❯ Object.createOnigScanner packages/core/src/textmate/resolver.ts:13:45 ❯ Grammar.createOnigScanner node_modules/.pnpm/@shikijs+vscode-textmate@9.3.0/node_modules/@shikijs/vscode-textmate/dist/index.mjs:2269:26 ❯ new CompiledRule node_modules/.pnpm/@shikijs+vscode-textmate@9.3.0/node_modules/@shikijs/vscode-textmate/dist/index.mjs:1609:28

Check failure on line 43 in packages/engine-javascript/src/index.ts

View workflow job for this annotation

GitHub Actions / test (18.x, ubuntu-latest)

packages/engine-javascript/test/compare.test.ts > cases > html-basic

SyntaxError: Invalid flags supplied to RegExp constructor 'dgv' ❯ Module.toRegExp node_modules/.pnpm/oniguruma-to-es@0.1.2/node_modules/oniguruma-to-es/src/index.js:109:10 ❯ defaultJavaScriptRegexConstructor packages/engine-javascript/src/index.ts:43:10 ❯ packages/engine-javascript/src/index.ts:90:23 ❯ new JavaScriptScanner packages/engine-javascript/src/index.ts:68:29 ❯ Object.createScanner packages/engine-javascript/src/index.ts:187:14 ❯ Object.createOnigScanner packages/core/src/textmate/resolver.ts:13:45 ❯ Grammar.createOnigScanner node_modules/.pnpm/@shikijs+vscode-textmate@9.3.0/node_modules/@shikijs/vscode-textmate/dist/index.mjs:2269:26 ❯ new CompiledRule node_modules/.pnpm/@shikijs+vscode-textmate@9.3.0/node_modules/@shikijs/vscode-textmate/dist/index.mjs:1609:28

Check failure on line 43 in packages/engine-javascript/src/index.ts

View workflow job for this annotation

GitHub Actions / test (18.x, ubuntu-latest)

packages/engine-javascript/test/compare.test.ts > cases > ts-basic

SyntaxError: Invalid flags supplied to RegExp constructor 'dgv' ❯ Module.toRegExp node_modules/.pnpm/oniguruma-to-es@0.1.2/node_modules/oniguruma-to-es/src/index.js:109:10 ❯ defaultJavaScriptRegexConstructor packages/engine-javascript/src/index.ts:43:10 ❯ packages/engine-javascript/src/index.ts:90:23 ❯ new JavaScriptScanner packages/engine-javascript/src/index.ts:68:29 ❯ Object.createScanner packages/engine-javascript/src/index.ts:187:14 ❯ Object.createOnigScanner packages/core/src/textmate/resolver.ts:13:45 ❯ Grammar.createOnigScanner node_modules/.pnpm/@shikijs+vscode-textmate@9.3.0/node_modules/@shikijs/vscode-textmate/dist/index.mjs:2269:26 ❯ new CompiledRule node_modules/.pnpm/@shikijs+vscode-textmate@9.3.0/node_modules/@shikijs/vscode-textmate/dist/index.mjs:1609:28

Check failure on line 43 in packages/engine-javascript/src/index.ts

View workflow job for this annotation

GitHub Actions / test (18.x, ubuntu-latest)

packages/engine-javascript/test/compare.test.ts > cases > jsonc

SyntaxError: Invalid flags supplied to RegExp constructor 'dgv' ❯ Module.toRegExp node_modules/.pnpm/oniguruma-to-es@0.1.2/node_modules/oniguruma-to-es/src/index.js:109:10 ❯ defaultJavaScriptRegexConstructor packages/engine-javascript/src/index.ts:43:10 ❯ packages/engine-javascript/src/index.ts:90:23 ❯ new JavaScriptScanner packages/engine-javascript/src/index.ts:68:29 ❯ Object.createScanner packages/engine-javascript/src/index.ts:187:14 ❯ Object.createOnigScanner packages/core/src/textmate/resolver.ts:13:45 ❯ Grammar.createOnigScanner node_modules/.pnpm/@shikijs+vscode-textmate@9.3.0/node_modules/@shikijs/vscode-textmate/dist/index.mjs:2269:26 ❯ new CompiledRule node_modules/.pnpm/@shikijs+vscode-textmate@9.3.0/node_modules/@shikijs/vscode-textmate/dist/index.mjs:1609:28

Check failure on line 43 in packages/engine-javascript/src/index.ts

View workflow job for this annotation

GitHub Actions / test (18.x, ubuntu-latest)

packages/engine-javascript/test/compare.test.ts > cases > vue

SyntaxError: Invalid flags supplied to RegExp constructor 'dgv' ❯ Module.toRegExp node_modules/.pnpm/oniguruma-to-es@0.1.2/node_modules/oniguruma-to-es/src/index.js:109:10 ❯ defaultJavaScriptRegexConstructor packages/engine-javascript/src/index.ts:43:10 ❯ packages/engine-javascript/src/index.ts:90:23 ❯ new JavaScriptScanner packages/engine-javascript/src/index.ts:68:29 ❯ Object.createScanner packages/engine-javascript/src/index.ts:187:14 ❯ Object.createOnigScanner packages/core/src/textmate/resolver.ts:13:45 ❯ Grammar.createOnigScanner node_modules/.pnpm/@shikijs+vscode-textmate@9.3.0/node_modules/@shikijs/vscode-textmate/dist/index.mjs:2269:26 ❯ new CompiledRule node_modules/.pnpm/@shikijs+vscode-textmate@9.3.0/node_modules/@shikijs/vscode-textmate/dist/index.mjs:1609:28

Check failure on line 43 in packages/engine-javascript/src/index.ts

View workflow job for this annotation

GitHub Actions / test (18.x, ubuntu-latest)

packages/engine-javascript/test/compare.test.ts > cases > toml

SyntaxError: Invalid flags supplied to RegExp constructor 'dgv' ❯ Module.toRegExp node_modules/.pnpm/oniguruma-to-es@0.1.2/node_modules/oniguruma-to-es/src/index.js:109:10 ❯ defaultJavaScriptRegexConstructor packages/engine-javascript/src/index.ts:43:10 ❯ packages/engine-javascript/src/index.ts:90:23 ❯ new JavaScriptScanner packages/engine-javascript/src/index.ts:68:29 ❯ Object.createScanner packages/engine-javascript/src/index.ts:187:14 ❯ Object.createOnigScanner packages/core/src/textmate/resolver.ts:13:45 ❯ Grammar.createOnigScanner node_modules/.pnpm/@shikijs+vscode-textmate@9.3.0/node_modules/@shikijs/vscode-textmate/dist/index.mjs:2269:26 ❯ new CompiledRule node_modules/.pnpm/@shikijs+vscode-textmate@9.3.0/node_modules/@shikijs/vscode-textmate/dist/index.mjs:1609:28

Check failure on line 43 in packages/engine-javascript/src/index.ts

View workflow job for this annotation

GitHub Actions / test (18.x, ubuntu-latest)

packages/engine-javascript/test/compare.test.ts > cases > sql

SyntaxError: Invalid flags supplied to RegExp constructor 'dgv' ❯ Module.toRegExp node_modules/.pnpm/oniguruma-to-es@0.1.2/node_modules/oniguruma-to-es/src/index.js:109:10 ❯ defaultJavaScriptRegexConstructor packages/engine-javascript/src/index.ts:43:10 ❯ packages/engine-javascript/src/index.ts:90:23 ❯ new JavaScriptScanner packages/engine-javascript/src/index.ts:68:29 ❯ Object.createScanner packages/engine-javascript/src/index.ts:187:14 ❯ Object.createOnigScanner packages/core/src/textmate/resolver.ts:13:45 ❯ Grammar.createOnigScanner node_modules/.pnpm/@shikijs+vscode-textmate@9.3.0/node_modules/@shikijs/vscode-textmate/dist/index.mjs:2269:26 ❯ new CompiledRule node_modules/.pnpm/@shikijs+vscode-textmate@9.3.0/node_modules/@shikijs/vscode-textmate/dist/index.mjs:1609:28

Check failure on line 43 in packages/engine-javascript/src/index.ts

View workflow job for this annotation

GitHub Actions / test (18.x, ubuntu-latest)

packages/engine-javascript/test/general.test.ts > should > works

SyntaxError: Invalid flags supplied to RegExp constructor 'dgv' ❯ Module.toRegExp node_modules/.pnpm/oniguruma-to-es@0.1.2/node_modules/oniguruma-to-es/src/index.js:109:10 ❯ defaultJavaScriptRegexConstructor packages/engine-javascript/src/index.ts:43:10 ❯ packages/engine-javascript/src/index.ts:90:23 ❯ new JavaScriptScanner packages/engine-javascript/src/index.ts:68:29 ❯ Object.createScanner packages/engine-javascript/src/index.ts:187:14 ❯ Object.createOnigScanner packages/core/src/textmate/resolver.ts:13:45 ❯ Grammar.createOnigScanner node_modules/.pnpm/@shikijs+vscode-textmate@9.3.0/node_modules/@shikijs/vscode-textmate/dist/index.mjs:2269:26 ❯ new CompiledRule node_modules/.pnpm/@shikijs+vscode-textmate@9.3.0/node_modules/@shikijs/vscode-textmate/dist/index.mjs:1609:28

Check failure on line 43 in packages/engine-javascript/src/index.ts

View workflow job for this annotation

GitHub Actions / test (18.x, ubuntu-latest)

packages/engine-javascript/test/general.test.ts > should > dynamic load theme and lang

SyntaxError: Invalid flags supplied to RegExp constructor 'dgv' ❯ Module.toRegExp node_modules/.pnpm/oniguruma-to-es@0.1.2/node_modules/oniguruma-to-es/src/index.js:109:10 ❯ defaultJavaScriptRegexConstructor packages/engine-javascript/src/index.ts:43:10 ❯ packages/engine-javascript/src/index.ts:90:23 ❯ new JavaScriptScanner packages/engine-javascript/src/index.ts:68:29 ❯ Object.createScanner packages/engine-javascript/src/index.ts:187:14 ❯ Object.createOnigScanner packages/core/src/textmate/resolver.ts:13:45 ❯ Grammar.createOnigScanner node_modules/.pnpm/@shikijs+vscode-textmate@9.3.0/node_modules/@shikijs/vscode-textmate/dist/index.mjs:2269:26 ❯ new CompiledRule node_modules/.pnpm/@shikijs+vscode-textmate@9.3.0/node_modules/@shikijs/vscode-textmate/dist/index.mjs:1609:28

Check failure on line 43 in packages/engine-javascript/src/index.ts

View workflow job for this annotation

GitHub Actions / test (18.x, ubuntu-latest)

packages/engine-javascript/test/verify.test.ts

SyntaxError: Invalid flags supplied to RegExp constructor 'dgv' ❯ Module.toRegExp node_modules/.pnpm/oniguruma-to-es@0.1.2/node_modules/oniguruma-to-es/src/index.js:109:10 ❯ defaultJavaScriptRegexConstructor packages/engine-javascript/src/index.ts:43:10 ❯ packages/engine-javascript/src/index.ts:90:23 ❯ new JavaScriptScanner packages/engine-javascript/src/index.ts:68:29 ❯ packages/engine-javascript/test/verify.test.ts:31:25
pattern,
{
flags: 'dgm',
ignoreContiguousAnchors: true,
accuracy: 'loose',
global: true,
hasIndices: true,
tmGrammar: true,
},
)
}

export class JavaScriptScanner implements PatternScanner {
regexps: (RegExp | null)[]
contiguousAnchorSimulation: boolean[]

constructor(
public patterns: string[],
Expand All @@ -65,8 +65,7 @@
regexConstructor = defaultJavaScriptRegexConstructor,
} = options

this.contiguousAnchorSimulation = Array.from({ length: patterns.length }, () => false)
this.regexps = patterns.map((p, idx) => {
this.regexps = patterns.map((p) => {
/**
* vscode-textmate replace anchors to \uFFFF, where we still not sure how to handle it correctly
*
Expand All @@ -77,10 +76,6 @@
if (simulation)
p = p.replaceAll('(^|\\\uFFFF)', '(^|\\G)')

// Detect contiguous anchors for simulation
if (simulation && (p.startsWith('(^|\\G)') || p.startsWith('(\\G|^)')))
this.contiguousAnchorSimulation[idx] = true

// Cache
const cached = cache?.get(p)
if (cached) {
Expand All @@ -92,13 +87,7 @@
throw cached
}
try {
let pattern = p
if (simulation) {
for (const [from, to] of replacements) {
pattern = pattern.replaceAll(from, to)
}
}
const regex = regexConstructor(pattern)
const regex = regexConstructor(p)
cache?.set(p, regex)
return regex
}
Expand Down Expand Up @@ -143,25 +132,18 @@
if (!regexp)
continue
try {
let offset = 0
regexp.lastIndex = startPosition
let match = regexp.exec(str)
const match = regexp.exec(str)

// If a regex starts with `(^|\\G)` or `(\\G|^)`, we simulate the behavior by cutting the string
if (!match && this.contiguousAnchorSimulation[i]) {
offset = startPosition
regexp.lastIndex = 0
match = regexp.exec(str.slice(startPosition))
}
if (!match)
continue

// If the match is at the start position, return it immediately
if (match.index === startPosition) {
return toResult(i, match, offset)
return toResult(i, match, 0)
}
// Otherwise, store it for later
pending.push([i, match, offset])
pending.push([i, match, 0])
}
catch (e) {
if (this.options.forgiving)
Expand All @@ -187,9 +169,10 @@
/**
* Use the modern JavaScript RegExp engine to implement the OnigScanner.
*
* As Oniguruma regex is more powerful than JavaScript regex, some patterns may not be supported.
* Errors will be thrown when parsing TextMate grammars with unsupported patterns.
* Set `forgiving` to `true` to ignore these errors and skip the unsupported patterns.
* As Oniguruma supports some features that can't be emulated using native JavaScript regexes, some
* patterns are not supported. Errors will be thrown when parsing TextMate grammars with
* unsupported patterns, and when the grammar includes patterns that use invalid Oniguruma syntax.
* Set `forgiving` to `true` to ignore these errors and skip any unsupported or invalid patterns.
*
* @experimental
*/
Expand Down
12 changes: 0 additions & 12 deletions packages/engine-javascript/src/replacements.ts

This file was deleted.

28 changes: 0 additions & 28 deletions packages/engine-javascript/test/utils.test.ts

This file was deleted.

Loading
Loading