Request support for reporting partial matches. #1014

hagbard · 2023-06-19T11:42:03Z

Some regex engines support the ability to determine that, in cases where input wasn't matched, that the input was however a valid prefix of something which could match the expression.

In Java this is implemented via the "hitEnd()" method which reports when the previous match operation failed only due to a lack of input.

This is useful when using regular expressions for things like incrementally validating user input, since it lets you differentiate between:

This is invalid because it could never be matched (issue a warning in the UI)
This isn't valid yet, but more input might make it valid.

This could be implemented in one of several ways but I think the easiest would be a new is_partial_match() function alongside is_match().

This would probably be best if it returns true for both complete and partial matches (exact name tbd).

In fact your docs should a good example of a case where partial matching could be useful:

let re = Regex::new("[0-9]{3}-[0-9]{3}-[0-9]{4}").unwrap();
let mat = re.find("phone: 111-222-3333").unwrap();

Wouldn't it be nice to be able to report to the user that 111-222-33 was an incomplete number rather than just failing to match it at all?

The text was updated successfully, but these errors were encountered:

BurntSushi · 2023-06-19T11:51:35Z

Wouldn't it be nice to be able to report to the user that 111-222-33 was an incomplete number rather than just failing to match it at all?

For a pattern like [0-9]{3}-[0-9]{3}-[0-9]{4}, your is_partial_match routine would, I imagine, always return true. The example you quoted doesn't benefit at all from the partial matching you've conceived of here, because it's looking for a phone number in mixed data. Presumably what you'd actually want is ^[0-9]{3}-[0-9]{3}-[0-9]{4}$. That is, a partial match seemingly only makes sense when the pattern is anchored. For an unanchored pattern, it behaves as it if starts with a (?s-u:.)*?, which means that any partial match routine is always going to say, "yeah, it's possible there is a match somewhere else."

There's also likely some API design that would need to be worked out to do this.

Overall, I'd like to see someone prototype this out-of-crate once regex-automata 0.3 is released. See #656 for more details there. One problem in particular that is on my mind is that partial match support will require changing search signatures from Result<Option<Match>, MatchError> to something else, like Result<Result<Match, NoMatchError>, MatchError>. Which is pretty annoying to deal with and is a very large change.

riking · 2023-07-14T01:17:38Z

It's fairly obvious how to implement this with a regex-automata 0.3 DFA: drive the DFA for all the input you have, and if you haven't reached a halt state yet, more input is acceptable. Feed the End-Of-Input token once the stream is exhausted and do the final check for match.

BurntSushi closed this as not planned Won't fix, can't repro, duplicate, stale Jun 19, 2023

BurntSushi added the question label Jun 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request support for reporting partial matches. #1014

Request support for reporting partial matches. #1014

hagbard commented Jun 19, 2023

BurntSushi commented Jun 19, 2023 •

edited

Loading

riking commented Jul 14, 2023 •

edited

Loading

Request support for reporting partial matches. #1014

Request support for reporting partial matches. #1014

Comments

hagbard commented Jun 19, 2023

BurntSushi commented Jun 19, 2023 • edited Loading

riking commented Jul 14, 2023 • edited Loading

BurntSushi commented Jun 19, 2023 •

edited

Loading

riking commented Jul 14, 2023 •

edited

Loading