-
Notifications
You must be signed in to change notification settings - Fork 445
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add an implimentation of Tuned Boyer-Moore. #419
Conversation
@@ -5,7 +5,7 @@ version = "0.1.0" | |||
authors = ["The Rust Project Developers"] | |||
license = "MIT/Apache-2.0" | |||
repository = "https://github.com/rust-lang/regex" | |||
documentation = "http://doc.rust-lang.org/regex/regex_syntax/index.html" | |||
documentation = "http://doc.rust-lang.org/regex/regex/index.html" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just a random nit I had when reading the cargo file. It seems odd for the benchmark suite to point to the syntax crate for documentation. I'm not at all attached to this change.
@@ -514,7 +514,7 @@ another matching engine with fixed memory requirements. | |||
extern crate aho_corasick; | |||
extern crate memchr; | |||
extern crate thread_local; | |||
#[cfg(test)] extern crate quickcheck; | |||
#[macro_use] #[cfg(test)] extern crate quickcheck; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Required to use the quickcheck!
macro. If there is a reason to avoid pulling quickcheck macros I can refactor to use the quickcheck
function.
@@ -392,7 +402,7 @@ impl SingleByteSet { | |||
/// | |||
/// TODO(burntsushi): Add some amount of shifting to this. | |||
#[derive(Clone, Debug)] | |||
pub struct SingleSearch { | |||
pub struct MemchrSearch { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed this guy because I thought the old name would be confusing in the presense of multiple search algorithms focused on finding a single literal.
e7c07d8
to
89d6df6
Compare
@BurntSushi, do you have time to give this a quick "seems worthwhile" or "not right now"? It doesn't have to be a full review. |
@ethanpailes Thanks for the ping! Looking over this, I think this actually looks pretty dang good. I don't have any specific comments so I think we should just bring it in. Thank you so much for doing this. :-) @bors r+ |
📌 Commit 89d6df6 has been approved by |
Add an implimentation of Tuned Boyer-Moore. While the existing literal string searching algorithm leveraging memchr is quite fast, in some case more traditional approaches still make sense. This patch provides an implimentation of Tuned Boyer-Moore as laid out in Fast String Searching by Hume & Sunday. Some refinements to their work were gleened from the grep source. See: #408 See: [ripgrep-617](BurntSushi/ripgrep#617)
💔 Test failed - status-appveyor |
89d6df6
to
b6e5353
Compare
362e6b4
to
37f6976
Compare
37f6976
to
286cccf
Compare
@BurntSushi, assuming tests pass, I'm now happy with this. The turnover length for patterns of uncommon chars seems to be around 7 or 8. I want to take another crack at finding an absolute length turnover where TBM always wins even if the pattern chars are not that common, but that can go in another PR. |
@ethanpailes Looks like quickcheck caught a bug? |
While the existing literal string searching algorithm leveraging memchr is quite fast, in some case more traditional approaches still make sense. This patch provides an implimentation of Tuned Boyer-Moore as laid out in Fast String Searching by Hume & Sunday. Some refinements to their work were gleened from the grep source. See: rust-lang#408 See: BurntSushi/ripgrep#617
So it does, I'll take another look. |
286cccf
to
597029c
Compare
I managed to blow away all my bug fixes when I updated the heuristic. I'm so good at git. Sorry about that. |
468ab20
to
2a3cc31
Compare
2a3cc31
to
d5be839
Compare
@BurntSushi, I've had quickcheck running in a loop locally for a few hours, so it should be good to go now. |
Thanks! Awesome work! @bors r+ |
📌 Commit d5be839 has been approved by |
Add an implimentation of Tuned Boyer-Moore. While the existing literal string searching algorithm leveraging memchr is quite fast, in some case more traditional approaches still make sense. This patch provides an implimentation of Tuned Boyer-Moore as laid out in Fast String Searching by Hume & Sunday. Some refinements to their work were gleened from the grep source. See: #408 See: [ripgrep-617](BurntSushi/ripgrep#617)
☀️ Test successful - status-appveyor, status-travis |
While the existing literal string searching algorithm
leveraging memchr is quite fast, in some case more
traditional approaches still make sense. This patch
provides an implimentation of Tuned Boyer-Moore as
laid out in Fast String Searching by Hume & Sunday.
Some refinements to their work were gleened from the
grep source.
See: #408
See: ripgrep-617