Skip to content
This repository has been archived by the owner on Dec 15, 2022. It is now read-only.

Introduce OnigString as the underlying caching mechanism #46

Merged
merged 3 commits into from
Dec 21, 2015
Merged

Introduce OnigString as the underlying caching mechanism #46

merged 3 commits into from
Dec 21, 2015

Conversation

alexdima
Copy link
Contributor

@alexdima alexdima commented Dec 6, 2015

Addresses issue #45

This is quite a huge change and I apologize for that, but IMHO a big refactoring of how caching works was needed.

The change does the following:

  • removes OnigCache, OnigStringContext and UnicodeUtils classes
  • introduces OnigString:
    • it is exposed to JavaScript
    • it is accepted as an argument to OnigScanner::FindNextMatchSync
    • it computes up-front in its ctor all the mapping knowledge needed to convert offsets / unicode code point lengths to/from UTF8/UTF16 (this is mostly what fixes the perf problem)
  • changed OnigRegExp:
    • it can only Search in a OnigString
    • locally caches the last Search result and makes sure it invalidates cached results correctly
  • adopted that all conversions from UTF8 offset <-> UTF16 offset or from Unicode lengths <-> UTF16 length is encapsulated in OnigString
  • bumped version to 6.0.0 because caching doesn't come automatically anymore. e.g.:
// This is the only way to get caching between `FindNextMatchSync` calls:
var str = new OnigString("hello world");
scanner.FindNextMatchSync(str, 0);
scanner.FindNextMatchSync(str, 1);
scanner.FindNextMatchSync(str, 2);
...

In my opinion the following can be further done to the code-base to take advantage of the major version increment (breaking changes):

  • remove FindNextMatch (not sure if anyone uses the async worker code given the sync path is fast enough now)
  • clean up how OnigRegExp is exposed to JavaScript -- expose it directly and not with a OnigScanner with exactly one OnigRegExp.

Added another test case to the benchmarks where the 4th line of jquery.min.js is prefixed with /*ü*/ (to create a long line that contains multi-byte characters):

I have adopted the change in vscode-textmate and please notice no perf regressions in the non multi-byte chars case and the perf improvements in the multi-byte chars case:

vscode-textmate benchmarks BEFORE

Tokenizing jQuery v2.0.3
TOKENIZING 8830 lines using grammar source.js
TOOK 724 ms.

Tokenizing jQuery v2.0.3 minified
TOKENIZING 7 lines using grammar source.js
TOOK 360 ms.

Tokenizing jQuery v2.0.3 minified with MB char
TOKENIZING 7 lines using grammar source.js
TOOK 10757 ms.

Tokenizing Bootstrap CSS v3.1.1
TOKENIZING 5786 lines using grammar source.css
TOOK 325 ms.

Tokenizing Bootstrap CSS v3.1.1 minified
TOKENIZING 7 lines using grammar source.css
TOOK 379 ms.

vscode-textmate benchmarks AFTER

Tokenizing jQuery v2.0.3
TOKENIZING 8830 lines using grammar source.js
TOOK 720 ms.

Tokenizing jQuery v2.0.3 minified
TOKENIZING 7 lines using grammar source.js
TOOK 357 ms.

Tokenizing jQuery v2.0.3 minified with MB char
TOKENIZING 7 lines using grammar source.js
TOOK 362 ms.

Tokenizing Bootstrap CSS v3.1.1
TOKENIZING 5786 lines using grammar source.css
TOOK 314 ms.

Tokenizing Bootstrap CSS v3.1.1 minified
TOKENIZING 7 lines using grammar source.css
TOOK 357 ms.

@alexdima
Copy link
Contributor Author

alexdima commented Dec 6, 2015

@zcbenz @kevinsawicki

@zcbenz
Copy link
Contributor

zcbenz commented Dec 21, 2015

Looks good to me, thanks!

zcbenz added a commit that referenced this pull request Dec 21, 2015
Introduce OnigString as the underlying caching mechanism
@zcbenz zcbenz merged commit eb92048 into atom:master Dec 21, 2015
@alexdima
Copy link
Contributor Author

alexdima commented Jan 5, 2016

Thank you! ❤️

@egamma egamma mentioned this pull request Jan 6, 2016
59 tasks
@nathansobo
Copy link
Contributor

@zcbenz Should we publish a new release to npm with this?

@zcbenz
Copy link
Contributor

zcbenz commented Jan 8, 2016

It had been published in 6.0.0.

@nathansobo
Copy link
Contributor

Sorry, didn't see an explicit commit with the version bump.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants