You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Some regex engines support the ability to determine that, in cases where input wasn't matched, that the input was however a valid prefix of something which could match the expression.
In Java this is implemented via the "hitEnd()" method which reports when the previous match operation failed only due to a lack of input.
This is useful when using regular expressions for things like incrementally validating user input, since it lets you differentiate between:
This is invalid because it could never be matched (issue a warning in the UI)
This isn't valid yet, but more input might make it valid.
This could be implemented in one of several ways but I think the easiest would be a new is_partial_match() function alongside is_match().
This would probably be best if it returns true for both complete and partial matches (exact name tbd).
In fact your docs should a good example of a case where partial matching could be useful:
let re = Regex::new("[0-9]{3}-[0-9]{3}-[0-9]{4}").unwrap();
let mat = re.find("phone: 111-222-3333").unwrap();
Wouldn't it be nice to be able to report to the user that 111-222-33 was an incomplete number rather than just failing to match it at all?
The text was updated successfully, but these errors were encountered:
Wouldn't it be nice to be able to report to the user that 111-222-33 was an incomplete number rather than just failing to match it at all?
For a pattern like [0-9]{3}-[0-9]{3}-[0-9]{4}, your is_partial_match routine would, I imagine, always return true. The example you quoted doesn't benefit at all from the partial matching you've conceived of here, because it's looking for a phone number in mixed data. Presumably what you'd actually want is ^[0-9]{3}-[0-9]{3}-[0-9]{4}$. That is, a partial match seemingly only makes sense when the pattern is anchored. For an unanchored pattern, it behaves as it if starts with a (?s-u:.)*?, which means that any partial match routine is always going to say, "yeah, it's possible there is a match somewhere else."
There's also likely some API design that would need to be worked out to do this.
Overall, I'd like to see someone prototype this out-of-crate once regex-automata 0.3 is released. See #656 for more details there. One problem in particular that is on my mind is that partial match support will require changing search signatures from Result<Option<Match>, MatchError> to something else, like Result<Result<Match, NoMatchError>, MatchError>. Which is pretty annoying to deal with and is a very large change.
It's fairly obvious how to implement this with a regex-automata 0.3 DFA: drive the DFA for all the input you have, and if you haven't reached a halt state yet, more input is acceptable. Feed the End-Of-Input token once the stream is exhausted and do the final check for match.
Some regex engines support the ability to determine that, in cases where input wasn't matched, that the input was however a valid prefix of something which could match the expression.
In Java this is implemented via the "hitEnd()" method which reports when the previous match operation failed only due to a lack of input.
This is useful when using regular expressions for things like incrementally validating user input, since it lets you differentiate between:
This could be implemented in one of several ways but I think the easiest would be a new
is_partial_match()
function alongsideis_match()
.This would probably be best if it returns true for both complete and partial matches (exact name tbd).
In fact your docs should a good example of a case where partial matching could be useful:
Wouldn't it be nice to be able to report to the user that
111-222-33
was an incomplete number rather than just failing to match it at all?The text was updated successfully, but these errors were encountered: