permit some no-op escape sequences for compatibility purposes #501

KiChjang · 2018-07-28T00:33:12Z

Some of the regexes found in https://github.com/ua-parser/uap-core is throwing errors when parsed with the regex crate:

regex parse error:
    (?:\/[A-Za-z0-9\.]+)? *([A-Za-z0-9 \-_\!\[\]:]*(?:[Aa]rchiver|[Ii]ndexer|[Ss]craper|[Bb]ot|[Ss]pider|[Cc]rawl[a-z]*))/(\d+)(?:\.(\d+)(?:\.(\d+))?)?
       ^^
error: unrecognized escape sequence

This sounds like we're deviating from the regex spec here. Can someone confirm?

The text was updated successfully, but these errors were encountered:

BurntSushi · 2018-07-28T00:41:39Z

Which regex spec are you referring to?

Otherwise, yes, this crate disallows unnecessary escapes. This is to permit the addition of new escapes in a backwards compatible way. It is plausible that we could allow certainly no-op escapes (such as for /), but this may or may not solve the large problem.

KiChjang · 2018-07-31T00:42:47Z

So I was referring to the ES6 spec for regexes. I believe that there are production-grade projects such as the one I linked in my first post which does contain unnecessary backslashes, and I think this crate should definitely provide a way to accept these regexes as-is, without modifying the regexes contained within to conform to the regex syntax introduced in this crate.

For the project I linked, here's a shortlist of what I did to make it compile with this crate:

\/ -> /
\! -> !
\ ->
|) -> )? (empty alternations)

BurntSushi · 2018-07-31T00:49:48Z

So I was referring to the ES6 spec for regexes.

This crate definitely does not, and never will, conform to the ES6 specification for regexes. It is a complete non-goal.

I am much more sympathetic to your practical concerns. I'm generally strongly opposed to allowing escapes to always work even when they are no-ops, but I could get on board with selecting a set of commonly escaped characters in the wild.

It might also be smart to try to fix those projects such that they don't use unnecessary escapes.

Empty alternations are something I'd also like to support, but a bug in the compiler prevents it for now.

RReverser · 2018-08-16T12:39:40Z

We actually made own Rust wrapper for uap-core, and yes, we had to fix unnecessary escapes. Perhaps we could open-source it to avoid duplication of efforts?

BurntSushi · 2018-08-16T12:41:04Z

@RReverser Were escapes the only reason for that? Or were there other issues that needed to be papered over?

RReverser · 2018-08-16T15:18:16Z

the only reason for that

Only reason for what? Implementing Rust version of uap-core?

BurntSushi · 2018-08-16T15:53:41Z

@RReverser In the process of doing that, you said you had to fix unnecessary escapes. Was there anything else you needed to do with the regexes specifically to make them work with Rust's regex engine?

RReverser · 2018-08-16T17:14:30Z

Ah. Well, another fix I had to do was to replace \d to match only ASCII digits, since it turned out to include Unicode ones as well, which should not be allowed from the point of UA parser, although in all other regards strings should be still matched in Unicode-aware mode (I suppose you remember our discussion about this).

Other than that, no other fixes were necessary, although I did write a bunch of extra analysis and rewrites using regex-syntax to optimise common cases.

BurntSushi · 2018-10-08T13:10:10Z

In an effort to keep conversation on this topic in one place, I'm going to respond to @zackw's issue #522 here:

To recap, my high level current thinking on this topic:

I'm generally strongly opposed to allowing escapes to always work even when they are no-ops, but I could get on board with selecting a set of commonly escaped characters in the wild.

The problem here is that I don't know how much effort we should expend to make the syntax compatible with other regex engines. Surely, we can all agree that 100% compatibility can never happen or be expected. So we have to choose some set of features that gets us part of the way there. I just don't know which things to choose.

The purpose for the current behavior is to permit backwards compatible additions to the syntax of regexes. Some regex engines were developed with very little foresight. Python's regex engine, for example, will permit just about anything to be escaped. Some escapes are significant, but the escapes that aren't significant behave as if they weren't escaped at all. This makes it impossible to add new escape sequences since existing escapes are valid and have specified match semantics.

This regex library chooses to return an error for non-significant escape sequences precisely because we consider turning invalid syntax into valid syntax to be a backwards compatible change, and we can legitimately get away with that by enforcing it.

This framework does not specifically forbid insignificant escape sequences. We can simply choose the ones we want to explicitly allow. , " and ' are all candidates, but they are far from the only ones. Moreover, even if we were to add new escape sequences in the future, it would probably be poor judgment to use an escape sequence that is commonly used in other regex engines as an insignificant escape sequence. e.g., Prescribing special meaning to \" while another regex engine just treats it as a literal " is probably poor form.

I think the most compelling use cases, from my perspective, are huge libraries of regexes. However, in practice, it seems quite difficult to just expect to be able to compile them without any other changes. As @RReverser notes above, the \d, \w and \s escape sequences are all Unicode aware by default, which is not common in other regex engines, which were likely built in a time before Unicode was as widespread as it is today. Therefore, even if we fix the cases of escape sequences, you still wind up with subtle match differences. If pressed, I am sure I could come up with a list of several more cases. What I'm trying to say here is that it may be unreasonable to expect a large existing library of regexes---written specifically for one particular regex engine---to just work out of the box on a different regex engine, even if it might in practice in some number cases.

zackw · 2018-10-08T14:18:06Z

I understand and appreciate your position that no-op escapes should not be added just because they are no-op escapes in other regex engines.

I would like to offer an independent argument for no-op \", \' and \/ (didn't think of \/ before, but yes, that one too) based on the fact that ", ', and / are commonly used to delimit regex literals in many different languages and contexts, and \", \', \/ are commonly understood to escape the delimitation (i.e. extend the regex literal past a point where it would otherwise have ended). In many cases, backslashes that escape delimitation will be stripped by the "outer" parser before the regex engine sees them, but not all (e.g. the Python raw strings I mentioned in #522). Therefore, no-op \", \' and \/ is not just a matter of compatibility with other regex engines, but other surrounding contexts besides Rust source code.

BurntSushi · 2018-10-08T14:39:41Z

@zackw Ah I see. I don't think I appreciated that point about Python's raw strings when I first read it in #522. Thanks for mentioning that again.

OK, so how about we start off by making ", ', (space, 0x20), / and ! escapeable but no-op? This will also cement the syntax such that these characters can never be used as a valid escape sequence that does anything other than match the literal being escaped. I think for these characters, that would be reasonable.

We can add more no-op escapes moving forward if they creep up, but I think these are probably the ones I see escaped most often.

KiChjang · 2018-10-08T16:59:45Z

@BurntSushi That sounds reasonable to me. About the empty alternations -- is there a tracking issue in rustc for the bug you mentioned?

BurntSushi · 2018-10-08T17:04:54Z

@KiChjang Errm, the "compiler" in this context refers to the regex compiler, not rustc. Sorry about the mixup. But no, there is no issue for it because I don't yet understand the bug and it only manifests when empty alternations are allowed. There probably should be an issue for the feature of empty alternations though.

zackw · 2018-10-08T17:12:18Z

+1 from me on no-op treatment for ", ', /, and space. +0 on !. I don't know of any context where ! is used as a delimiter, except sed's alternative-delimiter notation, m!...! where ! can be any single ASCII character; since it can be any character, that notation shouldn't be an argument for anything.

I see that @KiChjang originally asked for \! to be a no-op because of an existing body of JavaScript regexes that use it. JavaScript regex syntax defines \h to be equivalent to h for all characters h where \h has not already been assigned a special meaning (if I'm reading https://tc39.github.io/ecma262/#sec-patterns-static-semantics-character-value correctly), which is exactly the thing @BurntSushi didn't like up above. Could we have some specific examples, with context, of regexes using \! instead of bare ! please? I want to understand why they were written that way.

zackw · 2018-10-08T17:21:30Z

@BurntSushi Regarding empty alternations, it might be a good idea to file an issue just so it's on record as a known problem and something you intend to support in the future.

BurntSushi · 2018-10-08T17:28:02Z

Aye. I opened #524.

KiChjang · 2018-10-08T18:58:43Z

@zackw To be honest, I don't know why the regexes in the project I linked escapes !s, but here are the lines where it would escape it: https://github.com/ua-parser/uap-core/blob/23bfabe34b86f29f4840c9dd1ef6129e685581e3/regexes.yaml#L98-L100

zackw · 2018-10-08T20:51:46Z

    [A-Za-z0-9 \-_\!\[\]:]*
    [A-Za-z0-9 _\!\[\]:]*

It's not at all clear to me what either of those character classes are supposed to do. And the context makes it sound like they're not supposed to be different, either, for added bafflement. I'm actually left wondering whether someone thought ! was a metacharacter within JS regex character classes (it isn't; it's a metacharacter within shell glob character classes, but that's a totally different ball of wax).

RReverser · 2018-10-10T17:21:55Z

Supporting " and / would solve all the cases where I had to manually (and carefully) unescape regexes before passing to Regex::new in several projects, so big 👍 here. I guess ' also makes sense, but never seen ! as being important.

More generally, I think it should be possible to allow no-op escapes for any non-ASCII-alphanumeric characters without breaking forward compatibility, but starting with a limited set is probably better for now.

okdana · 2020-07-27T19:58:08Z

I wanted to mention that one of the reasons you might see 'no-op' escapes in the wild is that some languages' regex-escape functions produce them.

For example, Perl's quotemeta() escapes all non-word ASCII characters, and PHP's preg_quote() escapes all 'special' punctuation characters (even ones like ! that are only special when combined with an always-special character like ? that would be escaped anyway). Python's re.escape() used to work like PHP too, but it's been made more selective recently. I don't think JavaScript has a built-in escape function, but it does support look-around, so maybe there's some library that escapes ! for the same reason.

As far as patterns written out by humans, i'm just guessing, but i can think of two reasons they'd do it: (1) the author understands how the escaping works but chooses to rely on the 'no-op' feature so they don't have to remember which characters are special and when (not a bad reason imo), or (2) the author doesn't understand how the escaping works and simply cargo-cults it from the output of those functions, or from the first type of person.

anweiss · 2020-10-28T15:26:42Z

Is there an easy way to remove unnecessary escapes from a given string using this crate? If not, is there a list of characters that don't require escapes? I'm attempting to convert an existing PCRE-compatible regex to an expression that this crate can parse. Thanks!

BurntSushi · 2020-10-28T21:30:30Z

The crate certainly does not provide any such operation. It wouldn't really make sense to IMO.

As for a list of all meta characters, I think the only stable way to do that is to use is_meta_character from the regex-syntax crate. is_meta_character returns true only for characters that must be escaped in order to use their literal form. Other characters, such as ASCII space, can be escaped but do not need to be escaped.

Now, is_meta_character doesn't give you a list, but you can generate one by just trying all inputs. And since is_meta_character promises that the list will never expand or contract in a semver compatible release, the generated list will be stable. Or you could just use the fact that all meta characters are ASCII, so you only need to check 128 possible inputs instead of the full range of Unicode scalar values.

anweiss · 2020-10-29T14:25:29Z

Thanks @BurntSushi for the explanation! Super helpful! Will look at is_meta_character.

BurntSushi · 2022-12-02T02:55:53Z

That appears to be a regex101 thing, or something related to how they've configured PCRE2 (which has many many options). But / is not a meta character and it does not need to be escaped. You can also use ripgrep to try out PCRE2:

$ echo '/c' | rg -oP '(?:/c|b)'
/c

Or even pcre2grep itself:

$ echo '/c' | pcre2grep -o '(?:/c|b)'
/c

This resolves a long-standing (but somewhat minor) complaint that folks have with the regex crate: it does not permit escaping punctuation characters in cases where those characters do not need to be escaped. So things like \/, \" and \! would result in parse errors. Most other regex engines permit these, even in cases where they aren't needed. I had been against doing this for future evolution purposes, but it's incredibly unlikely that we're ever going to add a new meta character to the syntax. I literally cannot think of any conceivable future in which that might happen. However, we do continue to ban escapes for [0-9A-Za-z<>], because it is conceivable that we might add new escape sequences for those characters. (And 0-9 are already banned by virtue of them looking too much like backreferences, which aren't supported.) For example, we could add \Q...\E literal syntax. Or \< and \> as start and end word boundaries, as found in POSIX regex engines. Fixes #501

BurntSushi · 2023-03-03T14:37:28Z

This should be actually happening soon, where all ASCII characters except for [0-9A-Za-z<>] will be escapeable, even if the escape is superfluous. As part of this, I've introduced a new regex_syntax::is_escapeable function:

pub fn is_escapeable_character(c: char) -> bool {
    // Certainly escapeable if it's a meta character.
    if is_meta_character(c) {
        return true;
    }
    // Any character that isn't ASCII is definitely not escapeable. There's
    // no real need to allow things like \☃ right?
    if !c.is_ascii() {
        return false;
    }
    // Otherwise, we basically say that everything is escapeable unless it's a
    // letter or digit. Things like \3 are either octal (when enabled) or an
    // error, and we should keep it that way. Otherwise, letters are reserved
    // for adding new syntax in a backwards compatible way.
    match c {
        '0'..='9' | 'A'..='Z' | 'a'..='z' => false,
        // While not currently supported, we keep these as not escapeable to
        // give us some flexibility with respect to supporting the \< and
        // \> word boundary assertions in the future. By rejecting them as
        // escapeable, \< and \> will result in a parse error. Thus, we can
        // turn them into something else in the future without it being a
        // backwards incompatible change.
        '<' | '>' => false,
        _ => true,
    }
}

This resolves a long-standing (but somewhat minor) complaint that folks have with the regex crate: it does not permit escaping punctuation characters in cases where those characters do not need to be escaped. So things like \/, \" and \! would result in parse errors. Most other regex engines permit these, even in cases where they aren't needed. I had been against doing this for future evolution purposes, but it's incredibly unlikely that we're ever going to add a new meta character to the syntax. I literally cannot think of any conceivable future in which that might happen. However, we do continue to ban escapes for [0-9A-Za-z<>], because it is conceivable that we might add new escape sequences for those characters. (And 0-9 are already banned by virtue of them looking too much like backreferences, which aren't supported.) For example, we could add \Q...\E literal syntax. Or \< and \> as start and end word boundaries, as found in POSIX regex engines. Fixes #501

1.8.0 (2023-04-20) ================== This is a sizeable release that will be soon followed by another sizeable release. Both of them will combined close over 40 existing issues and PRs. This first release, despite its size, essentially represent preparatory work for the second release, which will be even bigger. Namely, this release: * Increases the MSRV to Rust 1.60.0, which was released about 1 year ago. * Upgrades its dependency on `aho-corasick` to the recently release 1.0 version. * Upgrades its dependency on `regex-syntax` to the simultaneously released `0.7` version. The changes to `regex-syntax` principally revolve around a rewrite of its literal extraction code and a number of simplifications and optimizations to its high-level intermediate representation (HIR). The second release, which will follow ~shortly after the release above, will contain a soup-to-nuts rewrite of every regex engine. This will be done by bringing [`regex-automata`](https://github.com/BurntSushi/regex-automata) into this repository, and then changing the `regex` crate to be nothing but an API shim layer on top of `regex-automata`'s API. These tandem releases are the culmination of about 3 years of on-and-off work that [began in earnest in March 2020](#656). Because of the scale of changes involved in these releases, I would love to hear about your experience. Especially if you notice undocumented changes in behavior or performance changes (positive *or* negative). Most changes in the first release are listed below. For more details, please see the commit log, which reflects a linear and decently documented history of all changes. New features: * [FEATURE #501](#501): Permit many more characters to be escaped, even if they have no significance. More specifically, any ASCII character except for `[0-9A-Za-z<>]` can now be escaped. Also, a new routine, `is_escapeable_character`, has been added to `regex-syntax` to query whether a character is escapeable or not. * [FEATURE #547](#547): Add `Regex::captures_at`. This filles a hole in the API, but doesn't otherwise introduce any new expressive power. * [FEATURE #595](#595): Capture group names are now Unicode-aware. They can now begin with either a `_` or any "alphabetic" codepoint. After the first codepoint, subsequent codepoints can be any sequence of alpha-numeric codepoints, along with `_`, `.`, `[` and `]`. Note that replacement syntax has not changed. * [FEATURE #810](#810): Add `Match::is_empty` and `Match::len` APIs. * [FEATURE #905](#905): Add an `impl Default for RegexSet`, with the default being the empty set. * [FEATURE #908](#908): A new method, `Regex::static_captures_len`, has been added which returns the number of capture groups in the pattern if and only if every possible match always contains the same number of matching groups. * [FEATURE #955](#955): Named captures can now be written as `(?<name>re)` in addition to `(?P<name>re)`. * FEATURE: `regex-syntax` now supports empty character classes. * FEATURE: `regex-syntax` now has an optional `std` feature. (This will come to `regex` in the second release.) * FEATURE: The `Hir` type in `regex-syntax` has had a number of simplifications made to it. * FEATURE: `regex-syntax` has support for a new `R` flag for enabling CRLF mode. This will be supported in `regex` proper in the second release. * FEATURE: `regex-syntax` now has proper support for "regex that never matches" via `Hir::fail()`. * FEATURE: The `hir::literal` module of `regex-syntax` has been completely re-worked. It now has more documentation, examples and advice. * FEATURE: The `allow_invalid_utf8` option in `regex-syntax` has been renamed to `utf8`, and the meaning of the boolean has been flipped. Performance improvements: * PERF: The upgrade to `aho-corasick 1.0` may improve performance in some cases. It's difficult to characterize exactly which patterns this might impact, but if there are a small number of longish (>= 4 bytes) prefix literals, then it might be faster than before. Bug fixes: * [BUG #514](#514): Improve `Debug` impl for `Match` so that it doesn't show the entire haystack. * BUGS [#516](#516), [#731](#731): Fix a number of issues with printing `Hir` values as regex patterns. * [BUG #610](#610): Add explicit example of `foo|bar` in the regex syntax docs. * [BUG #625](#625): Clarify that `SetMatches::len` does not (regretably) refer to the number of matches in the set. * [BUG #660](#660): Clarify "verbose mode" in regex syntax documentation. * BUG [#738](#738), [#950](#950): Fix `CaptureLocations::get` so that it never panics. * [BUG #747](#747): Clarify documentation for `Regex::shortest_match`. * [BUG #835](#835): Fix `\p{Sc}` so that it is equivalent to `\p{Currency_Symbol}`. * [BUG #846](#846): Add more clarifying documentation to the `CompiledTooBig` error variant. * [BUG #854](#854): Clarify that `regex::Regex` searches as if the haystack is a sequence of Unicode scalar values. * [BUG #884](#884): Replace `__Nonexhaustive` variants with `#[non_exhaustive]` attribute. * [BUG #893](#893): Optimize case folding since it can get quite slow in some pathological cases. * [BUG #895](#895): Reject `(?-u:\W)` in `regex::Regex` APIs. * [BUG #942](#942): Add a missing `void` keyword to indicate "no parameters" in C API. * [BUG #965](#965): Fix `\p{Lc}` so that it is equivalent to `\p{Cased_Letter}`. * [BUG #975](#975): Clarify documentation for `\pX` syntax.

This PR contains the following updates: | Package | Type | Update | Change | |---|---|---|---| | [regex](https://github.com/rust-lang/regex) | dependencies | minor | `1.7.3` -> `1.8.1` | --- ### Release Notes <details> <summary>rust-lang/regex</summary> ### [`v1.8.1`](https://github.com/rust-lang/regex/blob/HEAD/CHANGELOG.md#181-2023-04-21) \================== This is a patch release that fixes a bug where a regex match could be reported where none was found. Specifically, the bug occurs when a pattern contains some literal prefixes that could be extracted *and* an optional word boundary in the prefix. Bug fixes: - [BUG #981](rust-lang/regex#981): Fix a bug where a word boundary could interact with prefix literal optimizations and lead to a false positive match. ### [`v1.8.0`](https://github.com/rust-lang/regex/blob/HEAD/CHANGELOG.md#180-2023-04-20) \================== This is a sizeable release that will be soon followed by another sizeable release. Both of them will combined close over 40 existing issues and PRs. This first release, despite its size, essentially represents preparatory work for the second release, which will be even bigger. Namely, this release: - Increases the MSRV to Rust 1.60.0, which was released about 1 year ago. - Upgrades its dependency on `aho-corasick` to the recently released 1.0 version. - Upgrades its dependency on `regex-syntax` to the simultaneously released `0.7` version. The changes to `regex-syntax` principally revolve around a rewrite of its literal extraction code and a number of simplifications and optimizations to its high-level intermediate representation (HIR). The second release, which will follow ~shortly after the release above, will contain a soup-to-nuts rewrite of every regex engine. This will be done by bringing [`regex-automata`](https://github.com/BurntSushi/regex-automata) into this repository, and then changing the `regex` crate to be nothing but an API shim layer on top of `regex-automata`'s API. These tandem releases are the culmination of about 3 years of on-and-off work that [began in earnest in March 2020](rust-lang/regex#656). Because of the scale of changes involved in these releases, I would love to hear about your experience. Especially if you notice undocumented changes in behavior or performance changes (positive *or* negative). Most changes in the first release are listed below. For more details, please see the commit log, which reflects a linear and decently documented history of all changes. New features: - [FEATURE #501](rust-lang/regex#501): Permit many more characters to be escaped, even if they have no significance. More specifically, any ASCII character except for `[0-9A-Za-z<>]` can now be escaped. Also, a new routine, `is_escapeable_character`, has been added to `regex-syntax` to query whether a character is escapeable or not. - [FEATURE #547](rust-lang/regex#547): Add `Regex::captures_at`. This filles a hole in the API, but doesn't otherwise introduce any new expressive power. - [FEATURE #595](rust-lang/regex#595): Capture group names are now Unicode-aware. They can now begin with either a `_` or any "alphabetic" codepoint. After the first codepoint, subsequent codepoints can be any sequence of alpha-numeric codepoints, along with `_`, `.`, `[` and `]`. Note that replacement syntax has not changed. - [FEATURE #810](rust-lang/regex#810): Add `Match::is_empty` and `Match::len` APIs. - [FEATURE #905](rust-lang/regex#905): Add an `impl Default for RegexSet`, with the default being the empty set. - [FEATURE #908](rust-lang/regex#908): A new method, `Regex::static_captures_len`, has been added which returns the number of capture groups in the pattern if and only if every possible match always contains the same number of matching groups. - [FEATURE #955](rust-lang/regex#955): Named captures can now be written as `(?<name>re)` in addition to `(?P<name>re)`. - FEATURE: `regex-syntax` now supports empty character classes. - FEATURE: `regex-syntax` now has an optional `std` feature. (This will come to `regex` in the second release.) - FEATURE: The `Hir` type in `regex-syntax` has had a number of simplifications made to it. - FEATURE: `regex-syntax` has support for a new `R` flag for enabling CRLF mode. This will be supported in `regex` proper in the second release. - FEATURE: `regex-syntax` now has proper support for "regex that never matches" via `Hir::fail()`. - FEATURE: The `hir::literal` module of `regex-syntax` has been completely re-worked. It now has more documentation, examples and advice. - FEATURE: The `allow_invalid_utf8` option in `regex-syntax` has been renamed to `utf8`, and the meaning of the boolean has been flipped. Performance improvements: - PERF: The upgrade to `aho-corasick 1.0` may improve performance in some cases. It's difficult to characterize exactly which patterns this might impact, but if there are a small number of longish (>= 4 bytes) prefix literals, then it might be faster than before. Bug fixes: - [BUG #514](rust-lang/regex#514): Improve `Debug` impl for `Match` so that it doesn't show the entire haystack. - BUGS [#516](rust-lang/regex#516), [#731](rust-lang/regex#731): Fix a number of issues with printing `Hir` values as regex patterns. - [BUG #610](rust-lang/regex#610): Add explicit example of `foo|bar` in the regex syntax docs. - [BUG #625](rust-lang/regex#625): Clarify that `SetMatches::len` does not (regretably) refer to the number of matches in the set. - [BUG #660](rust-lang/regex#660): Clarify "verbose mode" in regex syntax documentation. - BUG [#738](rust-lang/regex#738), [#950](rust-lang/regex#950): Fix `CaptureLocations::get` so that it never panics. - [BUG #747](rust-lang/regex#747): Clarify documentation for `Regex::shortest_match`. - [BUG #835](rust-lang/regex#835): Fix `\p{Sc}` so that it is equivalent to `\p{Currency_Symbol}`. - [BUG #846](rust-lang/regex#846): Add more clarifying documentation to the `CompiledTooBig` error variant. - [BUG #854](rust-lang/regex#854): Clarify that `regex::Regex` searches as if the haystack is a sequence of Unicode scalar values. - [BUG #884](rust-lang/regex#884): Replace `__Nonexhaustive` variants with `#[non_exhaustive]` attribute. - [BUG #893](rust-lang/regex#893): Optimize case folding since it can get quite slow in some pathological cases. - [BUG #895](rust-lang/regex#895): Reject `(?-u:\W)` in `regex::Regex` APIs. - [BUG #942](rust-lang/regex#942): Add a missing `void` keyword to indicate "no parameters" in C API. - [BUG #965](rust-lang/regex#965): Fix `\p{Lc}` so that it is equivalent to `\p{Cased_Letter}`. - [BUG #975](rust-lang/regex#975): Clarify documentation for `\pX` syntax. </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] If you want to rebase/retry this PR, check this box --- This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).  Co-authored-by: cabr2-bot <cabr2.help@gmail.com> Co-authored-by: crapStone <crapstone01@gmail.com> Reviewed-on: https://codeberg.org/Calciumdibromid/CaBr2/pulls/1874 Reviewed-by: crapStone <crapstone@noreply.codeberg.org> Co-authored-by: Calciumdibromid Bot <cabr2_bot@noreply.codeberg.org> Co-committed-by: Calciumdibromid Bot <cabr2_bot@noreply.codeberg.org>

BurntSushi added the enhancement label Jul 31, 2018

BurntSushi changed the title ~~Unable to parse some of the regexes in ua-parser/uap-core~~ permit some no-op escape sequences for compatibility purposes Oct 8, 2018

This was referenced Oct 8, 2018

please support \" and \' as equivalent to " and ' respectively #522

Closed

Support \ (backslash space) as equivalent to space, at least in /x mode #523

Closed

BurntSushi mentioned this issue Nov 29, 2020

how to make rust regex compatible with online regex or other languages regex? #727

Closed

BurntSushi mentioned this issue Jun 17, 2021

[Feature request] Allow forward slash (/) to be optionally escaped in patterns BurntSushi/ripgrep#1902

Closed

mre mentioned this issue Jun 26, 2022

Unrecognised escape sequence (\/) lycheeverse/lychee#663

Closed

YamatoSecurity mentioned this issue Dec 2, 2022

Change regex engine? Yamato-Security/hayabusa#191

Closed

fukusuket mentioned this issue Dec 2, 2022

refactor: remove unneeded escapes(in |re block) SigmaHQ/sigma#3744

Merged

BurntSushi added the fix-incoming label Mar 3, 2023

BurntSushi closed this as completed in fe8d667 Apr 17, 2023

BurntSushi mentioned this issue Apr 20, 2023

release: 1.8.0 #979

Merged

chronolaw mentioned this issue May 8, 2023

feat(router): lua validate an expression against a schema Kong/atc-router#39

Merged

robtfm mentioned this issue Sep 1, 2023

Invalid escape character bevyengine/naga_oil#50

Closed

This was referenced Oct 29, 2023

Upgrade to the latest regex crate lpil/rustexp#20

Merged

[Rust] Support for the latest version of Regex crate firasdib/Regex101#2167

Closed

Omega359 mentioned this issue Feb 1, 2024

Add regexp_like scalar function apache/datafusion#9102

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

permit some no-op escape sequences for compatibility purposes #501

permit some no-op escape sequences for compatibility purposes #501

KiChjang commented Jul 28, 2018

BurntSushi commented Jul 28, 2018

KiChjang commented Jul 31, 2018 •

edited

Loading

BurntSushi commented Jul 31, 2018

RReverser commented Aug 16, 2018

BurntSushi commented Aug 16, 2018

RReverser commented Aug 16, 2018

BurntSushi commented Aug 16, 2018

RReverser commented Aug 16, 2018 •

edited

Loading

BurntSushi commented Oct 8, 2018

zackw commented Oct 8, 2018

BurntSushi commented Oct 8, 2018 •

edited

Loading

KiChjang commented Oct 8, 2018

BurntSushi commented Oct 8, 2018 •

edited

Loading

zackw commented Oct 8, 2018 •

edited

Loading

zackw commented Oct 8, 2018

BurntSushi commented Oct 8, 2018

KiChjang commented Oct 8, 2018

zackw commented Oct 8, 2018 •

edited

Loading

RReverser commented Oct 10, 2018

okdana commented Jul 27, 2020

anweiss commented Oct 28, 2020

BurntSushi commented Oct 28, 2020

anweiss commented Oct 29, 2020

BurntSushi commented Dec 2, 2022

BurntSushi commented Mar 3, 2023

permit some no-op escape sequences for compatibility purposes #501

permit some no-op escape sequences for compatibility purposes #501

Comments

KiChjang commented Jul 28, 2018

BurntSushi commented Jul 28, 2018

KiChjang commented Jul 31, 2018 • edited Loading

BurntSushi commented Jul 31, 2018

RReverser commented Aug 16, 2018

BurntSushi commented Aug 16, 2018

RReverser commented Aug 16, 2018

BurntSushi commented Aug 16, 2018

RReverser commented Aug 16, 2018 • edited Loading

BurntSushi commented Oct 8, 2018

zackw commented Oct 8, 2018

BurntSushi commented Oct 8, 2018 • edited Loading

KiChjang commented Oct 8, 2018

BurntSushi commented Oct 8, 2018 • edited Loading

zackw commented Oct 8, 2018 • edited Loading

zackw commented Oct 8, 2018

BurntSushi commented Oct 8, 2018

KiChjang commented Oct 8, 2018

zackw commented Oct 8, 2018 • edited Loading

RReverser commented Oct 10, 2018

okdana commented Jul 27, 2020

anweiss commented Oct 28, 2020

BurntSushi commented Oct 28, 2020

anweiss commented Oct 29, 2020

BurntSushi commented Dec 2, 2022

BurntSushi commented Mar 3, 2023

KiChjang commented Jul 31, 2018 •

edited

Loading

RReverser commented Aug 16, 2018 •

edited

Loading

BurntSushi commented Oct 8, 2018 •

edited

Loading

BurntSushi commented Oct 8, 2018 •

edited

Loading

zackw commented Oct 8, 2018 •

edited

Loading

zackw commented Oct 8, 2018 •

edited

Loading