Mutate regexp body #19

mbj · 2013-01-25T18:34:12Z

The rbx AST exposes the regexp as #source attribute that containts the source as string. This string could be parsed by a regexp library and the AST could be mutated by an additional mutator tree Mutant::Mutator::Regexp. This will outline many missing testcases.

The text was updated successfully, but these errors were encountered:

dkubb · 2013-01-25T18:49:48Z

This was discussed briefly in #18.

EDIT: This thread is probably worth reading before this one. I think mutation testing of regexps would expose a ton of bugs, since it is fairly rare for someone to actually test all the branches in a regexp, much less having tests that cover injected faults.

mbj · 2013-01-25T18:50:48Z

@dkubb Thx for adding reference. I forgot.

mbj · 2013-01-25T18:56:18Z

@dkubb I took a short look at the regexp_parser it looks good and well suited for our purpose.

dkubb · 2013-01-27T06:49:10Z

I was thinking about this a bit today and I wanted to jot down some of my ideas. I think we can take the existing operators and do some mutations to make sure more cases are covered.

Force the "1 or many" operator to test both the 1 and many conditions:

/([a-z]+)/ -> /([a-z]{2,})/
/([a-z]+)/ -> /([a-z])/

Force the "0 or many" operator to test both the 0, 1 or many conditions:

/([a-z]*)/ -> /()/
/([a-z]*)/ -> /([a-z])/
/([a-z]*)/ -> /([a-z]{2,})/

Force "0 or 1" operator to test both 0 or 1 conditions:

/([a-z]?)/ -> /()/
/([a-z]?)/ -> /([a-z])/

mbj · 2013-01-27T11:04:51Z

Lets add:

Force the capturing operator to actually test the capture:

/(foo)/ -> /(:?foo)/

Force the or operator to test both branches

/(foo|bar)/ -> /(foo)/
/(foo|bar)/ -> /(bar)/

dkubb · 2013-01-31T20:23:56Z

Here's another few:

/\w/ -> /\W/
/\d/ -> /\D/
/\h/ -> /\H/
/\s/ -> /\S/
/\W/ -> /\w/
/\D/ -> /\d/
/\H/ -> /\h/
/\S/ -> /\s/

And a few to encourage usage of more compact metacharacters:

/[a-zA-Z0-9_]/  -> /\w/
/[^a-zA-Z0-9_]/ -> /\W/
/[0-9]/         -> /\d/
/[^0-9]/        -> /\D/
/[0-9a-fA-F]/   -> /\h/
/[^0-9a-fA-F]/  -> /\H/
/[ \t\r\n\f]/   -> /\s/
/[^ \t\r\n\f]/  -> /\S/

dkubb · 2013-01-31T20:31:27Z

I also have an idea about expanding metacharacters and character classes to the list of individual characters, and then removing one of those characters and see if the regexp matches.

So for example, given /[a-z]/ this would be expanded to /[abcdefghijklmnopqrstuvwxyz]/ and then one mutation would remove the "a" so we have /[bcdefghijklmnopqrstuvwxyz]/. We would then proceed through and remove one character at a time, so the next would be /[acdefghijklmnopqrstuvwxyz]/ and so on. If no tests fail then it means the code may only be testing a subset of the character class and that more tests are needed that include the full range of possible characters.

The downside for this is that it could mean a potentially very large number of possible mutations. It might need a special mode that just runs these mutations on their own, or a switch that enables it. However, the upside would mean that the tests actually cover more possible kinds of inputs that the regexp will actually match.

mbj · 2013-01-31T21:03:31Z

For these I plan "grouped mutations". Where a specific percentage of the
mutations must be killed to succeed.

On Thu, Jan 31, 2013 at 12:31:28PM -0800, Dan Kubb wrote:

I also have an idea about expanding metacharacters and character classes to the list of individual characters, and then removing one of those characters and see if the regexp matches.

So for example, given /[a-z]/ this would be expanded to /[abcdefghijklmnopqrstuvwxyz]/ and then one mutation would remove the "a" so we have /[bcdefghijklmnopqrstuvwxyz]/. We would then proceed through and remove one character at a time, so the next would be /[acdefghijklmnopqrstuvwxyz]/ and so on. If no tests fail then it means the code may only be testing a subset of the character class and that more tests are needed that include the full range of possible characters.

The downside for this is that it could mean a potentially very large number of possible mutations. It might need a special mode that just runs these mutations on their own, or a switch that enables it. However, the upside would mean that the tests actually cover more possible kinds of inputs that the regexp will actually match.

Reply to this email directly or view it on GitHub:
#19 (comment)

Markus Schirp

Phone: +49 201 / 360 379 14
Fax: +49 201 / 360 379 16
Web: www.seonic.net
Email: mbj@seonic.net
Twitter: twitter.com/m_b_j
OS-Code: github.com/mbj

Seonic IT-Systems GbR
Anton Shatalov & Markus Schirp
Altendorferstrasse 44
D-45127 Essen

dkubb · 2013-02-09T22:26:18Z

Oh, I just thought of another good one. If we see a regexp like /^\d+$/ we could change it to /^\d+\z/. I'd bet most of the time it won't cause a spec failure, which indicates to me people assumed it was matching the end of the string, not just the end of line.

mbj · 2013-02-10T00:35:38Z

Yeah. Also with ^ => \A.

dkubb · 2013-02-10T05:49:02Z

Yeah, although technically ^ and \A are identical, so you'll never be able to kill that mutation.

mbj · 2013-02-10T11:10:09Z

No they arent:

irb(main):002:0> "bad-stuff\nfoo" =~ /^foo/
=> 10
irb(main):003:0> "bad-stuff\nfoo" =~ /\Afoo/
=> nil

dkubb · 2013-02-10T19:14:39Z

@mbj oh nice I did not know that!

mbj · 2013-02-10T19:21:19Z

Your perl background ;)

backus · 2016-05-23T07:09:18Z

Closed by #565! 😄 🎈

mbj mentioned this issue Apr 18, 2016

Mutate regexp body #565

Merged

backus closed this as completed May 23, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mutate regexp body #19

Mutate regexp body #19

mbj commented Jan 25, 2013

dkubb commented Jan 25, 2013

mbj commented Jan 25, 2013

mbj commented Jan 25, 2013

dkubb commented Jan 27, 2013

mbj commented Jan 27, 2013

dkubb commented Jan 31, 2013

dkubb commented Jan 31, 2013

mbj commented Jan 31, 2013

dkubb commented Feb 9, 2013

mbj commented Feb 10, 2013

dkubb commented Feb 10, 2013

mbj commented Feb 10, 2013

dkubb commented Feb 10, 2013

mbj commented Feb 10, 2013

backus commented May 23, 2016

Mutate regexp body #19

Mutate regexp body #19

Comments

mbj commented Jan 25, 2013

dkubb commented Jan 25, 2013

mbj commented Jan 25, 2013

mbj commented Jan 25, 2013

dkubb commented Jan 27, 2013

mbj commented Jan 27, 2013

dkubb commented Jan 31, 2013

dkubb commented Jan 31, 2013

mbj commented Jan 31, 2013

dkubb commented Feb 9, 2013

mbj commented Feb 10, 2013

dkubb commented Feb 10, 2013

mbj commented Feb 10, 2013

dkubb commented Feb 10, 2013

mbj commented Feb 10, 2013

backus commented May 23, 2016