Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use std::fs::read #79

Merged
merged 1 commit into from
Sep 18, 2018
Merged

Use std::fs::read #79

merged 1 commit into from
Sep 18, 2018

Conversation

mbrubeck
Copy link
Contributor

The optimizations from the regex_redux benchmark were added to libstd in rust-lang/rust#47324, so we can now use the standard library function instead of custom code to read a file.

Copy link
Contributor

@BurntSushi BurntSushi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Owner

@TeXitoi TeXitoi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

@TeXitoi TeXitoi merged commit ac8f044 into TeXitoi:master Sep 18, 2018
@TeXitoi
Copy link
Owner

TeXitoi commented Sep 18, 2018

Do you want I submit this benchmark?

@mbrubeck
Copy link
Contributor Author

Do you want I submit this benchmark?

No, it can wait until there are other changes to submit.

@TeXitoi
Copy link
Owner

TeXitoi commented Sep 18, 2018

Do you have any changes you want to do? Because the last update of this benchmark is more than a year ago, and I suspect it will not change in the near future if you or @BurntSushi didn't plan to improve it.

@BurntSushi
Copy link
Contributor

@TeXitoi I don't think this particular benchmark has much room for improvement in terms of the code we submit to the game. There are a few other entries that shift around the use of parallelism, but I don't think you're ever going to get a substantial win out of this. If this benchmark is going to improve, it's going to have to be in the regex engine I think, and the path forward even there isn't crystal clear to me.

regex-redux is a better benchmark than regex-dna, because with regex-dna, the regex engine could use simple literal searches for most of the regexes being used. In Rust's case, there is quite aggressive literal optimizations happening, which is why we did so well on regex-dna. regex-redux requires more out of the regex engine itself.

The last time I looked closely at this, my interpretation of the bottleneck was that in Rust's case, it needs to scan most matches both forwards (to find the end) and backwards (to find the start), where as PCRE only needs to go forwards. This typically doesn't matter too much, but the benchmark actually involves finding a lot of matches, so the overhead per match really matters in regex-redux.

So the fix for regex-redux, from what I can tell, is figuring out how to reduce the match overhead, possibly by avoiding the second scan. But I'm not even going to begin thinking about how to do that until regex internals have been refactored to be more amenable to optimizations.

@mbrubeck
Copy link
Contributor Author

There are some other stylistic changes I might make at some point, but nothing substantial. If there aren't any more changes after a while, it'd be fine to submit this version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants