Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mutate regexp body #19

Closed
mbj opened this issue Jan 25, 2013 · 15 comments
Closed

Mutate regexp body #19

mbj opened this issue Jan 25, 2013 · 15 comments

Comments

@mbj
Copy link
Owner

mbj commented Jan 25, 2013

The rbx AST exposes the regexp as #source attribute that containts the source as string. This string could be parsed by a regexp library and the AST could be mutated by an additional mutator tree Mutant::Mutator::Regexp. This will outline many missing testcases.

@dkubb
Copy link
Collaborator

dkubb commented Jan 25, 2013

This was discussed briefly in #18.

EDIT: This thread is probably worth reading before this one. I think mutation testing of regexps would expose a ton of bugs, since it is fairly rare for someone to actually test all the branches in a regexp, much less having tests that cover injected faults.

@mbj
Copy link
Owner Author

mbj commented Jan 25, 2013

@dkubb Thx for adding reference. I forgot.

@mbj
Copy link
Owner Author

mbj commented Jan 25, 2013

@dkubb I took a short look at the regexp_parser it looks good and well suited for our purpose.

@dkubb
Copy link
Collaborator

dkubb commented Jan 27, 2013

I was thinking about this a bit today and I wanted to jot down some of my ideas. I think we can take the existing operators and do some mutations to make sure more cases are covered.

  • Force the "1 or many" operator to test both the 1 and many conditions:

/([a-z]+)/ -> /([a-z]{2,})/
/([a-z]+)/ -> /([a-z])/

  • Force the "0 or many" operator to test both the 0, 1 or many conditions:

/([a-z]*)/ -> /()/
/([a-z]*)/ -> /([a-z])/
/([a-z]*)/ -> /([a-z]{2,})/

  • Force "0 or 1" operator to test both 0 or 1 conditions:

/([a-z]?)/ -> /()/
/([a-z]?)/ -> /([a-z])/

@mbj
Copy link
Owner Author

mbj commented Jan 27, 2013

Lets add:

Force the capturing operator to actually test the capture:

/(foo)/ -> /(:?foo)/

Force the or operator to test both branches

/(foo|bar)/ -> /(foo)/
/(foo|bar)/ -> /(bar)/

@dkubb
Copy link
Collaborator

dkubb commented Jan 31, 2013

Here's another few:

/\w/ -> /\W/
/\d/ -> /\D/
/\h/ -> /\H/
/\s/ -> /\S/
/\W/ -> /\w/
/\D/ -> /\d/
/\H/ -> /\h/
/\S/ -> /\s/

And a few to encourage usage of more compact metacharacters:

/[a-zA-Z0-9_]/  -> /\w/
/[^a-zA-Z0-9_]/ -> /\W/
/[0-9]/         -> /\d/
/[^0-9]/        -> /\D/
/[0-9a-fA-F]/   -> /\h/
/[^0-9a-fA-F]/  -> /\H/
/[ \t\r\n\f]/   -> /\s/
/[^ \t\r\n\f]/  -> /\S/

@dkubb
Copy link
Collaborator

dkubb commented Jan 31, 2013

I also have an idea about expanding metacharacters and character classes to the list of individual characters, and then removing one of those characters and see if the regexp matches.

So for example, given /[a-z]/ this would be expanded to /[abcdefghijklmnopqrstuvwxyz]/ and then one mutation would remove the "a" so we have /[bcdefghijklmnopqrstuvwxyz]/. We would then proceed through and remove one character at a time, so the next would be /[acdefghijklmnopqrstuvwxyz]/ and so on. If no tests fail then it means the code may only be testing a subset of the character class and that more tests are needed that include the full range of possible characters.

The downside for this is that it could mean a potentially very large number of possible mutations. It might need a special mode that just runs these mutations on their own, or a switch that enables it. However, the upside would mean that the tests actually cover more possible kinds of inputs that the regexp will actually match.

@mbj
Copy link
Owner Author

mbj commented Jan 31, 2013

For these I plan "grouped mutations". Where a specific percentage of the
mutations must be killed to succeed.

On Thu, Jan 31, 2013 at 12:31:28PM -0800, Dan Kubb wrote:

I also have an idea about expanding metacharacters and character classes to the list of individual characters, and then removing one of those characters and see if the regexp matches.

So for example, given /[a-z]/ this would be expanded to /[abcdefghijklmnopqrstuvwxyz]/ and then one mutation would remove the "a" so we have /[bcdefghijklmnopqrstuvwxyz]/. We would then proceed through and remove one character at a time, so the next would be /[acdefghijklmnopqrstuvwxyz]/ and so on. If no tests fail then it means the code may only be testing a subset of the character class and that more tests are needed that include the full range of possible characters.

The downside for this is that it could mean a potentially very large number of possible mutations. It might need a special mode that just runs these mutations on their own, or a switch that enables it. However, the upside would mean that the tests actually cover more possible kinds of inputs that the regexp will actually match.


Reply to this email directly or view it on GitHub:
#19 (comment)

Markus Schirp

Phone: +49 201 / 360 379 14
Fax: +49 201 / 360 379 16
Web: www.seonic.net
Email: mbj@seonic.net
Twitter: twitter.com/m_b_j
OS-Code: github.com/mbj

Seonic IT-Systems GbR
Anton Shatalov & Markus Schirp
Altendorferstrasse 44
D-45127 Essen

@dkubb
Copy link
Collaborator

dkubb commented Feb 9, 2013

Oh, I just thought of another good one. If we see a regexp like /^\d+$/ we could change it to /^\d+\z/. I'd bet most of the time it won't cause a spec failure, which indicates to me people assumed it was matching the end of the string, not just the end of line.

@mbj
Copy link
Owner Author

mbj commented Feb 10, 2013

Yeah. Also with ^ => \A.

@dkubb
Copy link
Collaborator

dkubb commented Feb 10, 2013

Yeah, although technically ^ and \A are identical, so you'll never be able to kill that mutation.

@mbj
Copy link
Owner Author

mbj commented Feb 10, 2013

No they arent:

irb(main):002:0> "bad-stuff\nfoo" =~ /^foo/
=> 10
irb(main):003:0> "bad-stuff\nfoo" =~ /\Afoo/
=> nil

@dkubb
Copy link
Collaborator

dkubb commented Feb 10, 2013

@mbj oh nice I did not know that!

@mbj
Copy link
Owner Author

mbj commented Feb 10, 2013

Your perl background ;)

@mbj mbj mentioned this issue Apr 18, 2016
@backus
Copy link
Contributor

backus commented May 23, 2016

Closed by #565! 😄 🎈

@backus backus closed this as completed May 23, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants