Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
1.8.0 (2023-04-20)
This is a sizeable release that will be soon followed by another sizeable release. Both of them will combined close over 40 existing issues and PRs.
This first release, despite its size, essentially represent preparatory work for the second release, which will be even bigger. Namely, this release:
aho-corasick
to the recently release 1.0 version.regex-syntax
to the simultaneously released0.7
version. The changes toregex-syntax
principally revolve around a rewrite of its literal extraction code and a number of simplifications and optimizations to its high-level intermediate representation (HIR).The second release, which will follow ~shortly after the release above, will contain a soup-to-nuts rewrite of every regex engine. This will be done by bringing
regex-automata
into this repository, and then changing theregex
crate to be nothing but an API shim layer on top ofregex-automata
's API.These tandem releases are the culmination of about 3 years of on-and-off work that began in earnest in March 2020.
Because of the scale of changes involved in these releases, I would love to hear about your experience. Especially if you notice undocumented changes in behavior or performance changes (positive or negative).
Most changes in the first release are listed below. For more details, please see the commit log, which reflects a linear and decently documented history of all changes.
New features:
[0-9A-Za-z<>]
can now be escaped. Also, a new routine,is_escapeable_character
, has been added toregex-syntax
to query whether a character is escapeable or not.Regex::captures_at
. This filles a hole in the API, but doesn't otherwise introduce any new expressive power._
or any "alphabetic" codepoint. After the first codepoint, subsequent codepoints can be any sequence of alpha-numeric codepoints, along with_
,.
,[
and]
. Note that replacement syntax has not changed.Match::is_empty
andMatch::len
APIs.impl Default for RegexSet
, with the default being the empty set.Regex::static_captures_len
, has been added which returns the number of capture groups in the pattern if and only if every possible match always contains the same number of matching groups.(?<name>re)
in addition to(?P<name>re)
.regex-syntax
now supports empty character classes.regex-syntax
now has an optionalstd
feature. (This will come toregex
in the second release.)Hir
type inregex-syntax
has had a number of simplifications made to it.regex-syntax
has support for a newR
flag for enabling CRLF mode. This will be supported inregex
proper in the second release.regex-syntax
now has proper support for "regex that never matches" viaHir::fail()
.hir::literal
module ofregex-syntax
has been completely re-worked. It now has more documentation, examples and advice.allow_invalid_utf8
option inregex-syntax
has been renamed toutf8
, and the meaning of the boolean has been flipped.Performance improvements:
aho-corasick 1.0
may improve performance in some cases. It's difficult to characterize exactly which patterns this might impact, but if there are a small number of longish (>= 4 bytes) prefix literals, then it might be faster than before.Bug fixes:
Debug
impl forMatch
so that it doesn't show the entire haystack.Hir
values as regex patterns.foo|bar
in the regex syntax docs.SetMatches::len
does not (regretably) refer to the number of matches in the set.CaptureLocations::get
so that it never panics.Regex::shortest_match
.\p{Sc}
so that it is equivalent to\p{Currency_Symbol}
.CompiledTooBig
error variant.regex::Regex
searches as if the haystack is a sequence of Unicode scalar values.__Nonexhaustive
variants with#[non_exhaustive]
attribute.(?-u:\W)
inregex::Regex
APIs.void
keyword to indicate "no parameters" in C API.\p{Lc}
so that it is equivalent to\p{Cased_Letter}
.\pX
syntax.