Skip to content

Commit

Permalink
feat: Use Oniguruma-To-ES in the JS engine (#828) (#832)
Browse files Browse the repository at this point in the history
Co-authored-by: Anthony Fu <github@antfu.me>
  • Loading branch information
slevithan and antfu authored Nov 15, 2024
1 parent 94cc6d8 commit 33b8b49
Show file tree
Hide file tree
Showing 19 changed files with 252 additions and 462 deletions.
2 changes: 2 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,8 @@ jobs:
node: [lts/*]
os: [ubuntu-latest, windows-latest, macos-latest]
include:
- node: 20.x
os: ubuntu-latest
- node: 18.x
os: ubuntu-latest
fail-fast: false
Expand Down
28 changes: 21 additions & 7 deletions docs/guide/regex-engines.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ outline: deep

# RegExp Engines

TextMate grammars is based on regular expressions to match tokens. Usually, we use [Oniguruma](https://github.com/kkos/oniguruma) (a regular expression engine written in C) to parse the grammar. To make it work in JavaScript, we compile Oniguruma to WebAssembly to run in the browser or Node.js.
TextMate grammars are based on regular expressions to match tokens. Usually, we use [Oniguruma](https://github.com/kkos/oniguruma) (a regular expression engine written in C) to parse the grammar. To make it work in JavaScript, we compile Oniguruma to WebAssembly to run in the browser or Node.js.

Since v1.15, we expose the ability to for users to switch the RegExp engine and provide custom implementations.

Expand All @@ -20,7 +20,7 @@ const shiki = await createShiki({
})
```

Shiki come with two built-in engines:
Shiki comes with two built-in engines:

## Oniguruma Engine

Expand All @@ -43,7 +43,7 @@ const shiki = await createShiki({
This feature is experimental and may change without following semver.
:::

This experimental engine uses JavaScript's native RegExp. As TextMate grammars' regular expressions are in Oniguruma flavor that might contains syntaxes that are not supported by JavaScript's RegExp, we use [`oniguruma-to-js`](https://github.com/antfu/oniguruma-to-js) to lowering the syntaxes and try to make them compatible with JavaScript's RegExp.
This engine uses JavaScript's native RegExp. As regular expressions used by TextMate grammars are written for Oniguruma, they might contain syntax that is not supported by JavaScript's RegExp, or expect different behavior for the same syntax. So we use [Oniguruma-To-ES](https://github.com/slevithan/oniguruma-to-es) to transpile Oniguruma patterns to native JavaScript RegExp.

```ts {2,4,9}
import { createHighlighter } from 'shiki'
Expand All @@ -60,17 +60,31 @@ const shiki = await createHighlighter({
const html = shiki.codeToHtml('const a = 1', { lang: 'javascript', theme: 'nord' })
```

Please check the [compatibility table](/references/engine-js-compat) to check the support status of the languages you are using.
Please check the [compatibility table](/references/engine-js-compat) for the support status of the languages you are using.

If mismatches are acceptable and you want it to get results whatever it can, you can enable the `forgiving` option to suppress any errors happened during the conversion:
Unlike the Oniguruma engine, the JavaScript engine is strict by default. It will throw an error if it encounters a pattern that it cannot convert. If mismatches are acceptable and you want best-effort results whenever possible, you can enable the `forgiving` option to suppress any errors that happened during the conversion:

```ts
const jsEngine = createJavaScriptRegexEngine({ forgiving: true })
// ...use the engine
```

::: info
If you runs Shiki on Node.js (or at build time), we still recommend using the Oniguruma engine for the best result, as most of the time bundle size or WebAssembly support is not a concern.
If you run Shiki on Node.js (or at build time) and bundle size or WebAssembly support is not a concern, we still recommend using the Oniguruma engine for the best result.

The JavaScript engine is more suitable for running in the browser in some cases that you want to control the bundle size.
The JavaScript engine is best when running in the browser and in cases when you want to control the bundle size.
:::

### JavaScript Runtime Target

For the most accurate result, [Oniguruma-To-ES](https://github.com/slevithan/oniguruma-to-es) requires the [RegExp `v` flag support](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/unicodeSets), which is available in Node.js v20+ and ES2024 ([Browser compatibility](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/unicodeSets#browser_compatibility)).

For older environments, it can simulate the behavior but `u` flag but might yield less accurate results.

By default, it automatically detects the runtime target and uses the appropriate behavior. You can override this behavior by setting the `target` option:

```ts
const jsEngine = createJavaScriptRegexEngine({
target: 'ES2018', // or 'ES2024', default is 'auto'
})
```
Loading

0 comments on commit 33b8b49

Please sign in to comment.