-
Notifications
You must be signed in to change notification settings - Fork 445
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
regex 0.2 #310
Merged
Merged
regex 0.2 #310
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This uses the new Replacer trait essentially as defined in the `bytes` sub-module and described in rust-lang#151. Fixes rust-lang#151
It is useless because it will always return false (since every regex has at least one capture group corresponding to the full match). Fixes rust-lang#179
It is misleading to suggest that Regex implements equality, since equality is a well defined operation on regular expressions and this particular implementation doesn't correspond to that definition at all. Moreover, I suspect the actual use cases for such an impl are rather niche. A simple newtype+deref should resolve any such use cases. Fixes rust-lang#178
This corrects a gaffe of mine. In particular, both types contain references to a `Captures` *and* the text that was searched, but only names one lifetime. In practice, this means that the shortest lifetime is used, which can be problematic for when one is trying to extract submatch text. This also fixes the lifetime annotation on `iter_pos`, which should be tied to the Captures and not the text. It was always possible to work around this by using indices. Fixes rust-lang#168
This is replaced by using RegexBuilder. Fixes rust-lang#166
It encourages compiling a regex for every use, which can be convenient in some circumstances but deadly for performance. Fixes rust-lang#165
Similarly, rename RegexSplitsN to SplitsN. This follows the convention of all other iterator types. In general, we shouldn't namespace our type names.
Mostly, this adds an `Iter` suffix to all of the names.
If `replace` doesn't find any matches, then it can return the original string unchanged.
This remove the InvalidSet variant, which is no longer used, and no longer exposes the `regex_syntax::Error` type, instead exposing it as a string.
This also removes Captures.{at,pos} and replaces it with Captures.get, which now returns a Match. Similarly, Captures.name returns a Match as well. Fixes rust-lang#276
All use cases can be replaced with Regex::capture_names.
Specifically, use mutable references instead of passing ownership.
For example, the regex `[:upper:]` used to correspond to the `upper` ASCII character class, but it now corresponds to the character class containing the characters `:upper:`. Forms like `[[:upper:][:blank:]]` are still accepted. Fixes rust-lang#175
The escaping of &, - and ~ is only required when the characters are repeated adjacently, which should be quite rare. Escaping of [ is always required, unless it appear in the second position of a range. These rules enable us to add character class sets as described in UTS#18 RL1.3 in a backward compatible way.
This was added because regex 0.1 supports Rust 1.3+. But we can now assume Rust 1.12+, which has Vec::extend_from_slice. Yay for less unsafe!
BurntSushi
force-pushed
the
rfc
branch
2 times, most recently
from
December 30, 2016 21:46
ca60bf9
to
f8903d9
Compare
When building a Match, we should avoid storing a subslice and instead store the full string. We can punt subslicing to access. This seems to get LLVM to optimize tight loops better when the subslice isn't needed.
BurntSushi
force-pushed
the
rfc
branch
2 times, most recently
from
December 31, 2016 17:57
e818f7e
to
cc56d60
Compare
This API mirrors RegexBuilder, but for multiple patterns. Also, modify regex-capi to use RegexSetBuilder internally.
@bors r+ |
📌 Commit ac3ab6d has been approved by |
bors
added a commit
that referenced
this pull request
Dec 31, 2016
regex 0.2 0.2.0 ===== This is a new major release of the regex crate, and is an implementation of the [regex 1.0 RFC](https://github.com/rust-lang/rfcs/blob/master/text/1620-regex-1.0.md). We are releasing a `0.2` first, and if there are no major problems, we will release a `1.0` shortly. For `0.2`, the minimum *supported* Rust version is 1.12. There are a number of **breaking changes** in `0.2`. They are split into two types. The first type correspond to breaking changes in regular expression syntax. The second type correspond to breaking changes in the API. Breaking changes for regex syntax: * POSIX character classes now require double bracketing. Previously, the regex `[:upper:]` would parse as the `upper` POSIX character class. Now it parses as the character class containing the characters `:upper:`. The fix to this change is to use `[[:upper:]]` instead. Note that variants like `[[:upper:][:blank:]]` continue to work. * The character `[` must always be escaped inside a character class. * The characters `&`, `-` and `~` must be escaped if any one of them are repeated consecutively. For example, `[&]`, `[\&]`, `[\&\&]`, `[&-&]` are all equivalent while `[&&]` is illegal. (The motivation for this and the prior change is to provide a backwards compatible path for adding character class set notation.) * A `bytes::Regex` now has Unicode mode enabled by default (like the main `Regex` type). This means regexes compiled with `bytes::Regex::new` that don't have the Unicode flag set should add `(?-u)` to recover the original behavior. Breaking changes for the regex API: * `find` and `find_iter` now **return `Match` values instead of `(usize, usize)`.** `Match` values have `start` and `end` methods, which return the match offsets. `Match` values also have an `as_str` method, which returns the text of the match itself. * The `Captures` type now only provides a single iterator over all capturing matches, which should replace uses of `iter` and `iter_pos`. Uses of `iter_named` should use the `capture_names` method on `Regex`. * The `replace` methods now return `Cow` values. The `Cow::Borrowed` variant is returned when no replacements are made. * The `Replacer` trait has been completely overhauled. This should only impact clients that implement this trait explicitly. Standard uses of the `replace` methods should continue to work unchanged. * The `quote` free function has been renamed to `escape`. * The `Regex::with_size_limit` method has been removed. It is replaced by `RegexBuilder::size_limit`. * The `RegexBuilder` type has switched from owned `self` method receivers to `&mut self` method receivers. Most uses will continue to work unchanged, but some code may require naming an intermediate variable to hold the builder. * The free `is_match` function has been removed. It is replaced by compiling a `Regex` and calling its `is_match` method. * The `PartialEq` and `Eq` impls on `Regex` have been dropped. If you relied on these impls, the fix is to define a wrapper type around `Regex`, impl `Deref` on it and provide the necessary impls. * The `is_empty` method on `Captures` has been removed. This always returns `false`, so its use is superfluous. * The `Syntax` variant of the `Error` type now contains a string instead of a `regex_syntax::Error`. If you were examining syntax errors more closely, you'll need to explicitly use the `regex_syntax` crate to re-parse the regex. * The `InvalidSet` variant of the `Error` type has been removed since it is no longer used. * Most of the iterator types have been renamed to match conventions. If you were using these iterator types explicitly, please consult the documentation for its new name. For example, `RegexSplits` has been renamed to `Split`. A number of bugs have been fixed: * [BUG #151](#151): The `Replacer` trait has been changed to permit the caller to control allocation. * [BUG #165](#165): Remove the free `is_match` function. * [BUG #166](#166): Expose more knobs (available in `0.1`) and remove `with_size_limit`. * [BUG #168](#168): Iterators produced by `Captures` now have the correct lifetime parameters. * [BUG #175](#175): Fix a corner case in the parsing of POSIX character classes. * [BUG #178](#178): Drop the `PartialEq` and `Eq` impls on `Regex`. * [BUG #179](#179): Remove `is_empty` from `Captures` since it always returns false. * [BUG #276](#276): Position of named capture can now be retrieved from a `Captures`. * [BUG #296](#296): Remove winapi/kernel32-sys dependency on UNIX. * [BUG #307](#307): Fix error on emscripten.
☀️ Test successful - status-appveyor, status-travis |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
0.2.0
This is a new major release of the regex crate, and is an implementation of the
regex 1.0 RFC.
We are releasing a
0.2
first, and if there are no major problems, we willrelease a
1.0
shortly. For0.2
, the minimum supported Rust version is1.12.
There are a number of breaking changes in
0.2
. They are split into twotypes. The first type correspond to breaking changes in regular expression
syntax. The second type correspond to breaking changes in the API.
Breaking changes for regex syntax:
[:upper:]
would parse as theupper
POSIX character class. Now it parsesas the character class containing the characters
:upper:
. The fix to thischange is to use
[[:upper:]]
instead. Note that variants like[[:upper:][:blank:]]
continue to work.[
must always be escaped inside a character class.&
,-
and~
must be escaped if any one of them arerepeated consecutively. For example,
[&]
,[\&]
,[\&\&]
,[&-&]
are allequivalent while
[&&]
is illegal. (The motivation for this and the priorchange is to provide a backwards compatible path for adding character class
set notation.)
bytes::Regex
now has Unicode mode enabled by default (like the mainRegex
type). This means regexes compiled withbytes::Regex::new
thatdon't have the Unicode flag set should add
(?-u)
to recover the originalbehavior.
Breaking changes for the regex API:
find
andfind_iter
now returnMatch
values instead of(usize, usize)
.Match
values havestart
andend
methods, whichreturn the match offsets.
Match
values also have anas_str
method,which returns the text of the match itself.
Captures
type now only provides a single iterator over all capturingmatches, which should replace uses of
iter
anditer_pos
. Uses ofiter_named
should use thecapture_names
method onRegex
.replace
methods now returnCow
values. TheCow::Borrowed
variantis returned when no replacements are made.
Replacer
trait has been completely overhauled. This should onlyimpact clients that implement this trait explicitly. Standard uses of
the
replace
methods should continue to work unchanged.quote
free function has been renamed toescape
.Regex::with_size_limit
method has been removed. It is replaced byRegexBuilder::size_limit
.RegexBuilder
type has switched from ownedself
method receivers to&mut self
method receivers. Most uses will continue to work unchanged, butsome code may require naming an intermediate variable to hold the builder.
is_match
function has been removed. It is replaced by compilinga
Regex
and calling itsis_match
method.PartialEq
andEq
impls onRegex
have been dropped. If you reliedon these impls, the fix is to define a wrapper type around
Regex
, implDeref
on it and provide the necessary impls.is_empty
method onCaptures
has been removed. This always returnsfalse
, so its use is superfluous.Syntax
variant of theError
type now contains a string instead ofa
regex_syntax::Error
. If you were examining syntax errors more closely,you'll need to explicitly use the
regex_syntax
crate to re-parse the regex.InvalidSet
variant of theError
type has been removed since it isno longer used.
were using these iterator types explicitly, please consult the documentation
for its new name. For example,
RegexSplits
has been renamed toSplit
.A number of bugs have been fixed:
The
Replacer
trait has been changed to permit the caller to controlallocation.
Remove the free
is_match
function.Expose more knobs (available in
0.1
) and removewith_size_limit
.Iterators produced by
Captures
now have the correct lifetime parameters.Fix a corner case in the parsing of POSIX character classes.
Drop the
PartialEq
andEq
impls onRegex
.Remove
is_empty
fromCaptures
since it always returns false.Position of named capture can now be retrieved from a
Captures
.Remove winapi/kernel32-sys dependency on UNIX.
Fix error on emscripten.