Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix RegexCompiler regression on 32-bit for some set matching #68655

Merged
merged 1 commit into from
May 1, 2022

Conversation

stephentoub
Copy link
Member

We added an optimization to regex where for sets containing values all within 64 characters of each other (e.g. all hex digits), we use a ulong to represent a bitmap and can implement the check in an entirely branchless manner. This results in a measurable win on 64-bit, e.g. upwards of 20% for some patterns. Unfortunately, it also results in a measurable regression on 32-bit. This PR fixes that for RegexCompiler by special-casing the optimization to only apply when IntPtr.Size == 8. For the source generator, we don't have the same luxury of knowing that the code is emitted and used on the same bitness, so since it would result in very complicated code to emit multiple implementations and since we generally prefer optimizing for 64-bit, I've left it in for the source generator.

Fixes #68565

@stephentoub stephentoub added this to the 7.0.0 milestone Apr 28, 2022
@stephentoub stephentoub requested a review from joperezr April 28, 2022 14:29
@ghost
Copy link

ghost commented Apr 28, 2022

Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions
See info in area-owners.md if you want to be subscribed.

Issue Details

We added an optimization to regex where for sets containing values all within 64 characters of each other (e.g. all hex digits), we use a ulong to represent a bitmap and can implement the check in an entirely branchless manner. This results in a measurable win on 64-bit, e.g. upwards of 20% for some patterns. Unfortunately, it also results in a measurable regression on 32-bit. This PR fixes that for RegexCompiler by special-casing the optimization to only apply when IntPtr.Size == 8. For the source generator, we don't have the same luxury of knowing that the code is emitted and used on the same bitness, so since it would result in very complicated code to emit multiple implementations and since we generally prefer optimizing for 64-bit, I've left it in for the source generator.

Fixes #68565

Author: stephentoub
Assignees: -
Labels:

area-System.Text.RegularExpressions, tenet-performance

Milestone: 7.0.0

@ghost ghost assigned stephentoub Apr 28, 2022
@stephentoub stephentoub requested review from GrabYourPitchforks and removed request for GrabYourPitchforks April 28, 2022 14:29
@stephentoub
Copy link
Member Author

@GrabYourPitchforks, please let me know if you have any better ideas for the source generator :) One thing I contemplated was adding a 32-bit path for values within 32 of each other, so that we'd only have the 32-bit regression for sets between 33 and 64, but the most impactful ones (e.g. hex digits) fall in that range.

We added an optimization to regex where for sets containing values all within 64 characters of each other (e.g. all hex digits), we use a ulong to represent a bitmap and can implement the check in an entirely branchless manner.  This results in a measurable win on 64-bit, e.g. upwards of 20% for some patterns.  Unfortunately, it also results in a measurable regression on 32-bit.  This PR fixes that for RegexCompiler by special-casing the optimization to only apply when IntPtr.Size == 8.  For the source generator, we don't have the same luxury of knowing that the code is emitted and used on the same bitness, so since it would result in very complicated code to emit multiple implementations and since we generally prefer optimizing for 64-bit, I've left it in for the source generator.
if (analysis.OnlyRanges && (analysis.UpperBoundExclusiveIfOnlyRanges - analysis.LowerBoundInclusiveIfOnlyRanges) <= 64)
// We skip this on 32-bit, as otherwise using 64-bit numbers in this manner is a deoptimization
// when compared to the subsequent fallbacks.
if (IntPtr.Size == 8 && analysis.OnlyRanges && (analysis.UpperBoundExclusiveIfOnlyRanges - analysis.LowerBoundInclusiveIfOnlyRanges) <= 64)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Environment.Is64BitProcess is a little more descriptive.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can address in a follow-up as necessary

@stephentoub stephentoub merged commit ba4c35d into dotnet:main May 1, 2022
@stephentoub stephentoub deleted the fix32bitregression branch May 1, 2022 23:09
@ghost ghost locked as resolved and limited conversation to collaborators Jun 1, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Regressions in System.Text.RegularExpressions.Tests.Perf_Regex_Industry_Leipzig
2 participants