Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR introduces a ground-up rewrite of the entire crate. Most or
all use cases served by
aho-corasick 0.6
should be served by thisrewrite as well. Pretty much everything has been improved. The API is
simpler, and much more flexible with many new configuration knobs for
controlling the space-vs-time tradeoffs of Aho-Corasick automatons. In
particular, there are several tunable optimizations for controlling
space usage such as state ID representation and byte classes.
The API is simpler in that there is now just one type that encapsulates
everything:
AhoCorasick
.Support for streams has been improved quite a bit, with new APIs for
stream search & replace.
Test and benchmark coverage has increased quite a bit.
This also fixes a subtle but important bug: empty patterns are now
handled correctly. Previously, they could never match, but now they can
match at any position.
Finally, I believe this is now the only Aho-Corasick implementation to
support leftmost-first and leftmost-longest semantics by using what I
think is a novel alteration to the Aho-Corasick construction algorithm.
I surveyed some other implementations, and there are a few Java
libraries that support leftmost-longest match semantics, but they
implement it by adding a sliding queue at search time. I also looked
into Perl's regex implementation which has an Aho-Corasick optimization
for
foo|bar|baz|...|quux
style regexes, and therefore must somehowimplement leftmost-first semantics. The code is a bit hard to grok, but
it looks like this is being handled at search time as opposed to baking
it into the automaton.
Fixes #18, Fixes #19, Fixes #26, Closes #34