-
Notifications
You must be signed in to change notification settings - Fork 583
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to represent regular expressions #22
Comments
JSON schema uses ECMAscript regexes (https://spacetelescope.github.io/understanding-json-schema/reference/regular_expressions.html), which is what C++ uses more or less. So, we should probably use that. Is there a Go lib for this? |
It's definitely not supported by the built-in regexp package. ECMAscript supports backreferences and re2 doesn't. I don't know about third party libraries, though. |
Currently, PGV is documented to support re2. Ideally, none of the generated code (any lang) will have dependencies outside of the stdlib. So... We can limit to the POSIX ERE syntax, if that's something we can support out-of-the-box in C++? |
C++ can do something "similar" to ERE, see https://www.regular-expressions.info/stdregex.html for the caveats which mostly relate to non-ASCII and embedded line breaks. http://en.cppreference.com/w/cpp/regex/basic_regex as well. |
I suspect it won't be possible to avoid all dependencies outside of the standard libraries. UTF-8 support, which is required by some string validations, is not supported in the C++ standard library. I don't think URL or IP validation are either (though I may be mistaken). Go just happens to have a standard library with substantially more breadth than C++. That being said, re2 wouldn't be the worst thing to depend on, since it seems to have bindings for a reasonable number of languages. |
Now that we're adding more languages, I think it's time to revisit this. Both Java and Python support re2, and while I like not having dependencies outside the standard libraries, this seems like a good exception to make. |
I think we could live with |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Envoy now uses RE2 as a safe regex engine instead of std::regex (envoyproxy/envoy#7878). Because PGV already requires patterns to use RE2 syntax, one option is to use RE2 for C++ patterns as well. This implements it, for use in strings, bytes, repeated items, and may key/value pattern validation. Implements #22 WIP: I ran in to difficulty creating the regex because a regex containing a null character would get cut off... for example, the ascii character test used the pattern, ^[\x00-x7f]+$, and consuming this as a string resulted in creating a null-terminated string pattern ^[ instead of the actual pattern. I think this might be a problem across most of the C++ code? That's why there's a terrible string construction in the pattern creation. Signed-off-by: Asra Ali <asraa@google.com>
The current defacto regular expression implementation is the one used by Go, which uses the re2 syntax. It isn't POSIX-compliant, nor is it immediately compatible with C++'s std::basic_regex and friends. This shows up most obviously when trying to use flags (
.
matches newline, case-insensitive matching, etc.) to modify the matching behavior: Go encodes these as part of the expression string while C++ uses a separate bitmask.The text was updated successfully, but these errors were encountered: