Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify some regular expressions #13961

Merged
merged 1 commit into from
Sep 4, 2021

Conversation

Snuffleupagus
Copy link
Collaborator

There's a fair number of regular expressions througout the code-base which are slightly more verbose than strictly necessary, in particular:

  • We have a lot of regular expressions that use [0-9] explicitly, and those can be simplified to use \d instead.
  • We have one instance of a regular expression containing a A-Za-z0-9_ sequence, which can be simplified to use \w instead.

There's a fair number of regular expressions througout the code-base which are slightly more verbose than strictly necessary, in particular:
 - We have a lot of regular expressions that use `[0-9]` explicitly, and those can be simplified to use `\d` instead.
 - We have one instance of a regular expression containing a `A-Za-z0-9_` sequence, which can be simplified to use `\w` instead.
@calixteman
Copy link
Contributor

Would it be possible to have some linting on regexs ?

@Snuffleupagus
Copy link
Collaborator Author

Snuffleupagus commented Sep 2, 2021

Would it be possible to have some linting on regexs ?

That should certainly be possible, but so far I've not had time to investigate it fully yet (it's on my list of things to do). Thus far I've looked at:

I wanted to start small, with a simple PR containing mostly mechanical search-and-replace changes, to gauge the interest in this before spending a lot of time/effort to this :-)

@calixteman
Copy link
Contributor

calixteman commented Sep 2, 2021

I'm the author of most of the regexs here and at least I can tell you why I did that: in some languages d is a shortcut for digits which is not limited to digits 0,...,9 but which is a shortcut for all digits in Unicode. So I prefer to be explicit to avoid to have to remember what d is exactly (in general I try to avoid such shortcuts in regex in whatever language).
Out of curiosity, I searched in firefox code base:
https://searchfox.org/mozilla-central/search?q=%5B0-9%5D&case=true&path=*.js

Anyway I'm fine with this change.

@timvandermeij
Copy link
Contributor

/botio test

@pdfjsbot
Copy link

pdfjsbot commented Sep 4, 2021

From: Bot.io (Linux m4)


Received

Command cmd_test from @timvandermeij received. Current queue size: 0

Live output at: http://54.241.84.105:8877/5a7d320ddf8a757/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Sep 4, 2021

From: Bot.io (Windows)


Received

Command cmd_test from @timvandermeij received. Current queue size: 0

Live output at: http://54.193.163.58:8877/1e1da1e33bf7582/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Sep 4, 2021

From: Bot.io (Linux m4)


Failed

Full output at http://54.241.84.105:8877/5a7d320ddf8a757/output.txt

Total script time: 21.60 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Integration Tests: Passed
  • Regression tests: FAILED
  different ref/snapshot: 10
  different first/second rendering: 3

Image differences available at: http://54.241.84.105:8877/5a7d320ddf8a757/reftest-analyzer.html#web=eq.log

@pdfjsbot
Copy link

pdfjsbot commented Sep 4, 2021

From: Bot.io (Windows)


Failed

Full output at http://54.193.163.58:8877/1e1da1e33bf7582/output.txt

Total script time: 39.79 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Integration Tests: FAILED
  • Regression tests: FAILED
  different ref/snapshot: 7
  different first/second rendering: 1

Image differences available at: http://54.193.163.58:8877/1e1da1e33bf7582/reftest-analyzer.html#web=eq.log

@timvandermeij timvandermeij merged commit 680f33c into mozilla:master Sep 4, 2021
@timvandermeij
Copy link
Contributor

Thanks! I'm also always a bit careful with \d since it indeed has a subtly different meaning in other languages where it also matches Unicode digits, but fortunately for JavaScript there is no such behavior according to MDN: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions/Cheatsheet

@Snuffleupagus Snuffleupagus deleted the simpler-regexp branch September 19, 2021 12:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants