Skip to content

Commit

Permalink
Improve TOKENIZER by 25%
Browse files Browse the repository at this point in the history
From what I can see, this is done in linear time: 4*O(n)
This tokenizer change converts that to something a little quicker: 3*O(n)

Seems that not using a capture group and something other than split would be a big win.
Other than that, the changes were meager.

I used https://regex101.com/ (and pcre2) to evaluate the cost of the TOKENIZER.
I verified with cruby 3.0.6

```
 /(%%\{[^\}]+\}|%\{[^\}]+\})/ =~ '%{{'*9999)+'}'

/(%%\{[^\}]+\}|%\{[^\}]+\})/ ==> 129,990 steps
/(%?%\{[^\}]+\})/            ==> 129,990 steps
/(%%?\{[^\}]+\})/            ==>  99,992 steps (simple savings of 25%) <===
/(%%?\{[^%}{]+\})/           ==>  89,993 steps (limiting variable contents has minimal gains)
```

Also of note are the null/simple cases:

```
/x/ =~ '%{{'*9999)+'}'
/x/                          ==>  29,998 steps
/(x)/                        ==>  59,996 steps
/%{x/                        ==>  49,998 steps
/(%%?{x)/                    ==>  89,993 steps
```

And the plain string that doesn't fair too much worse than the specially crafted string

```
/x/ =~ 'abb'*9999+'c'

/x/                          ==>  29,999
/(%%?{x)/                    ==>  59,998
/(%%?\{[^\}]+\})/            ==>  59,998
/(%%\{[^\}]+\}|%\{[^\}]+\})/ ==>  89,997
```

per #667
  • Loading branch information
kbrock committed Jun 13, 2023
1 parent 395aa5e commit 0113426
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion lib/i18n/backend/interpolation_compiler.rb
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ module InterpolationCompiler
module Compiler
extend self

TOKENIZER = /(%%\{[^\}]+\}|%\{[^\}]+\})/
TOKENIZER = /(%%?\{[^\}]+\})/
INTERPOLATION_SYNTAX_PATTERN = /(%)?(%\{([^\}]+)\})/

def compile_if_an_interpolation(string)
Expand Down

0 comments on commit 0113426

Please sign in to comment.