Skip to content
This repository has been archived by the owner on Dec 29, 2021. It is now read-only.

Regex matching. #85

Closed
wants to merge 5 commits into from
Closed

Regex matching. #85

wants to merge 5 commits into from

Conversation

Mike-Neto
Copy link

@Mike-Neto Mike-Neto commented Jan 25, 2018

Looking for feedback on this PR, Fixes #81
Added functionality to test against regex::Regex.

  • match regex fuzzy must match the regex at least once to pass.
  • match regex non fuzzy takes a second param to specify the number of matches it needs to find.

Also leaves room to implement predicate matching.

Added two new functions to the API

  • pub fn matches(mut self, output: regex::Regex) -> Assert
  • pub fn matches_ntimes(mut self, output: regex::Regex, nmatches: u32) -> Assert

You can see an example of the API in use in my own crate's branch: https://github.com/Mike-Neto/img_diff/tree/assert-cli-regex

Test's are not verified as i'm on Windows and 7 test's are failing from master (only those 7 keep failing).
Clippy and Rustfmt are not run yet as I'm just looking for feedback on the implementation itself.

  • I have created tests for any new feature, or added regression tests for bugfixes.
  • cargo test succeeds
  • Clippy is happy: cargo +nightly clippy succeeds
  • Rustfmt is happy: cargo +nightly fmt succeeds

@epage
Copy link
Collaborator

epage commented Jan 25, 2018

RE Your API.

  • What is the use case for matches_ntimes?
  • If we're exposing regex to the user, do we bother with distinguishing between fuzzy and non-fuzzy regex?

RE Conflicts

I'm torn. We're at a cross roads of a major refactor of the code / API (#74 or #75) and it'd be nice to avoid conflicts with those PRs because they were big. On the other hand, I don't want to discourage contributions or completely hold up assert_cli while we figure out what is going on.

@Mike-Neto
Copy link
Author

  • What is the use case for matches_ntimes?
    I use it in the tool I linked. The way it works is, it output's the diff value to the console and I check if they are there, simple enough.
    For a single file I know the expected output, so something like .is("Dssim(0)") is appropriate, but in rust nightly the output is Dssim(0.0) , this is where the regex matching comes in, allowing me to target both output's.
    This is for a single file, now for multiple files I could just match their output directly, but when dealing with multiple files the order in which they are compared is not deterministic, as such the best i can do is to check that, for example, I have two Dssim(0) and a single Dssim(0.234234).
    Otherwise i'm just validating that I have at least one Dssim(0) and at least one Dssim(0.234234) which can be misleading as it will not check if the second Dssim(0) check was successful.
    As such it's goal is to provide a way to match a single pattern that may occur more than once and be sure that it happens the expected amount of times.

  • If we're exposing regex to the user, do we bother with distinguishing between fuzzy and non-fuzzy regex?
    Not sure what you mean here can you elaborate?

RE Conflicts:
I understand your concerns, but this is a Feature that is more or less ready and can be added as a non breaking change, as such if the merge from those big changes are not scheduled to be merged soon we might as well push this new feature out 1st and then rework it into the new code.

@epage
Copy link
Collaborator

epage commented Jan 25, 2018

I use it in the tool I linked. The way it works is, it output's the diff value to the console and I check if they are there, simple enough.

Then github's search failed me. Could you provide a link?

If we're exposing regex to the user, do we bother with distinguishing between fuzzy and non-fuzzy regex?

Not sure what you mean here can you elaborate?

https://github.com/killercup/assert_cli/pull/85/files#diff-b8e1279b7e534c886db53e49d60c14a5R495

In that, one specified fuzzy as true and another as false.

(sigh the refactor really cleans up this code)

@Mike-Neto
Copy link
Author

Then github's search failed me. Could you provide a link?

https://github.com/Mike-Neto/img_diff/blob/256c0a75bc1ee1077f4278b320ed0e27ce6e7d5e/src/main.rs#L221

https://github.com/killercup/assert_cli/pull/85/files#diff-b8e1279b7e534c886db53e49d60c14a5R495

In that, one specified fuzzy as true and another as false.

ohh, I was just following the "pattern" that has there in the 1st place, contains is fuzzy as it isn't as specific (can match with many different output's as long as they contain the input string) as is, which is analogous to my match and matches_ntimes is more like is, at least that was the interpretation i got from that.

Can you point me which branch is the most "recent", by that I mean the one you want this feature implemented in, maybe I can make a PR for that branch instead and rework my code around it.

Also what do you thing about not taking the regex::Regex param an take it as a String and avoid making the calling code have to instantiate the regex itself.

@epage
Copy link
Collaborator

epage commented Feb 2, 2018

Sorry for the delay; I got caught up in other projects

ohh, I was just following the "pattern" that has there in the 1st place, contains is fuzzy as it isn't as specific (can match with many different output's as long as they contain the input string) as is, which is analogous to my match and matches_ntimes is more like is, at least that was the interpretation i got from that.

fuzzy is basically the predecessor to your ExpectType.

Can you point me which branch is the most "recent", by that I mean the one you want this feature implemented in, maybe I can make a PR for that branch instead and rework my code around it.

#74 has both a refactor and API change. I'm thinking of splitting these up so its easier to get changes in while we worry about the API. I could probably have that done by end of day tomorrow. Would you want to adjust your work to be on top of that?

Also what do you thing about not taking the regex::Regex param an take it as a String and avoid making the calling code have to instantiate the regex itself.

That can be handy., On the other hand, I was playing with the idea of having the contains and is functions be smart and take both strings and regexes. In the string case, it behaves as today rather than interpreting it as a regex.

I had this idea before I looked at the docs. I assumed the regex crate had something like python's, with distinct search and match, So we could either emulate that behavior or we could have distinct names for regex matching (and implicitly convert strings to regex).

Thoughts?

@Mike-Neto
Copy link
Author

#74 has both a refactor and API change. I'm thinking of splitting these up so its easier to get changes in while we worry about the API. I could probably have that done by end of day tomorrow. Would you want to adjust your work to be on top of that?

Yes, that way we can get these features in for current users and only introduce the braking API changes later.

Regarding your last point, I think different functions is the best option as they document intent and will also use strings as params to allow for simple refactoring.
My reason to keep them in separated functions is the () handling, as native types like Some and Option print Some(VALUE) which will beak .contains and .is in cases were we use regex internally (how can we even try to guess which one was it that the user trying to use?).
Regarding the search and match behaviors i agree to implementing both as they are usefully however .search will also be able to take as (optional) param the amount of matches expected similar to my current .matches_ntimes.

Looking forward to implement against the new code base :)

@epage epage mentioned this pull request Feb 3, 2018
@epage
Copy link
Collaborator

epage commented Feb 4, 2018

Feel free to provide feedback on #87

Copy link
Collaborator

@epage epage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI We'd prefer it for commit histories to be cleaned up before we merge them. Don't worry about it now but once this is fully ready and review comments are done, we'd appreciate it if you could clean them up.

Keep in mind that github doesn't always send notifications for forced pushes, so you'll need to let us know when its pushed so we can go in and merge.

fn verify(&self, got: &[u8]) -> Result<()> {
let conversion = String::from_utf8_lossy(got);
let got = conversion.as_ref();
if self.times == 0 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are your thoughts on using a sentinel value (0) compared to using Option?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good suggestion, It's just some bad habit's take a while to leave :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ask not to point out bad habits because sometimes the line is blurry. In another project, I have one behavior when a Vec is empty and another when it has items. I'm not using an Option for it because it doesn't feel like it'd jive quite right with the API.

@@ -386,6 +463,16 @@ mod errors {
description("Output predicate failed")
display("{}\noutput=```{}```", msg, got)
}
/* Adding a single error more makes this break, using the bottom one temporarily
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what way does this break?

Granted, at some point we should probably move beyond error-chain

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of stack space, this is a known bug in error-chain. I'm gonna look up some docs for this problem.

/// .stdout().matches("[0-9]{2}")
/// .unwrap();
/// ```
pub fn matches(mut self, output: String) -> Assert {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Should we accept a regex?
  2. Should we accept a byte slice and a byte regex?

If we decide on yes, it doesn't mean you have to do them (I don't want to bar for contributions to be perfection) but we need to at least create issues for them and keep them in mind with the design of the API / implementation.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think yes, at least a regex, a byte regex not so much IMHO, however, I see no problem in setting us up for later by implementing an abstraction layer here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, here the current usage code is more like .matches(String::from("[0-9]") as such, i will probably use str instead of string (just borrow).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For str vs String, you can always make it accept both

/// .stdout().matches_ntimes("[0-9]{1}", 2)
/// .unwrap();
/// ```
pub fn matches_ntimes(output: String, nmatches: u32) -> Self {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(longer term brain storming, feel free to ignore me)

Ideally, in the long run I'd like to do this through a more builder-like approach .matches("pattern").times(10). contains could also implement this. It'd provide a nice way to keep the upfront API "small". We're already talking about doing this for other features.

The interesting challenge is deciding how to implement that.

The nasty "unsafe" option is for Output to have a .times() function. I say "unsafe" because the call is meaningless in some cases (like .is).

Another option is to move away from having people interact with Output and instead have people construct the predicates directly and we'll do an Into<ContentPredicate>. This will end up more like killercup's proposed API.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something that can be very useful but out of the scope of this PR, for now, match_ntimes(V, N) is a good enough solution, we might as well open an improvement for later to refactor and abstract into times(N).

@epage
Copy link
Collaborator

epage commented Apr 7, 2018

FYI with #98 we are switching to generic predicates. In assert-rs/predicates-rs#18 I'm adding regex to the generic predicates. It doesn't contain repetitions but I have noted that in assert-rs/predicates-rs#12 .

@epage
Copy link
Collaborator

epage commented May 29, 2018

Addressed in https://github.com/assert-rs/assert_cmd

@epage epage closed this May 29, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Regex matching in OutputAssertionBuilder
3 participants