Tokenize purely using regexes #123

mvorisek · 2024-06-04T08:33:59Z

Match next token using single regex only, +18% speedup.

In the future, Tokenizer::makeTokenizeRegexes() method can be made public to allow grammar override for each token type.

src/Tokenizer.php

derrabus

Impressive work, thank you!

src/Tokenizer.php

greg0ire

It looks like at least 2 commits could be squashed together

src/Tokenizer.php

+7% speedup

mvorisek · 2024-09-10T16:01:36Z

last 3 commits squashed (3rd commit was reverting 1st commit, therefore the resulting/last commit is very simple)

derrabus · 2024-09-11T07:29:37Z

Thank you.

mvorisek force-pushed the tokenize_using_single_regex branch 4 times, most recently from a492556 to 056670d Compare June 4, 2024 08:58

mvorisek force-pushed the tokenize_using_single_regex branch 4 times, most recently from 3a81fb4 to 1a8dc67 Compare June 30, 2024 22:10

mvorisek marked this pull request as ready for review June 30, 2024 22:10

greg0ire reviewed Aug 6, 2024

View reviewed changes

src/Tokenizer.php Show resolved Hide resolved

Tokenize purely using regexes

21b7e81

mvorisek force-pushed the tokenize_using_single_regex branch from 1a8dc67 to 9ed8bdb Compare September 5, 2024 05:50

derrabus reviewed Sep 5, 2024

View reviewed changes

src/Tokenizer.php Show resolved Hide resolved

src/Tokenizer.php Show resolved Hide resolved

mvorisek requested a review from greg0ire September 10, 2024 15:30

greg0ire reviewed Sep 10, 2024

View reviewed changes

src/Tokenizer.php Outdated Show resolved Hide resolved

Improve critical matching loop a little

808c095

+7% speedup

mvorisek force-pushed the tokenize_using_single_regex branch from 9ed8bdb to 808c095 Compare September 10, 2024 15:58

address review

b9e7100

derrabus approved these changes Sep 11, 2024

View reviewed changes

derrabus merged commit 16ca9e3 into doctrine:1.5.x Sep 11, 2024
10 checks passed

mvorisek deleted the tokenize_using_single_regex branch September 11, 2024 09:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tokenize purely using regexes #123

Tokenize purely using regexes #123

mvorisek commented Jun 4, 2024 •

edited

Loading

derrabus left a comment

greg0ire left a comment

mvorisek commented Sep 10, 2024

derrabus commented Sep 11, 2024

Tokenize purely using regexes #123

Tokenize purely using regexes #123

Conversation

mvorisek commented Jun 4, 2024 • edited Loading

derrabus left a comment

Choose a reason for hiding this comment

greg0ire left a comment

Choose a reason for hiding this comment

mvorisek commented Sep 10, 2024

derrabus commented Sep 11, 2024

mvorisek commented Jun 4, 2024 •

edited

Loading