-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable regex to use IndexOf(..., OrdinalIgnoreCase) for prefix searching #85438
Enable regex to use IndexOf(..., OrdinalIgnoreCase) for prefix searching #85438
Conversation
As one of its many ways of finding the next possible match starting location, Regex recognizes a string known to start the expression and uses IndexOf to find it. With this change, it can also do so for OrdinalIgnoreCase. With improvements to IndexOf(..., OrdinalIgnoreCase), this now yields significantly faster searches through longer inputs, in addition to leading to simpler code in source generated regexes.
Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions Issue DetailsAs one of its many ways of finding the next possible match starting location, Regex recognizes a string known to start the expression and uses IndexOf to find it. With this change, it can also do so for OrdinalIgnoreCase. With improvements to IndexOf(..., OrdinalIgnoreCase), this now yields significantly faster searches through longer inputs, in addition to leading to simpler code in source generated regexes. With #85437, here's the benchmark https://github.com/dotnet/performance/blob/6dccc9979e9a99ebabee2a9b8b9e657c08c3f4a0/src/benchmarks/micro/libraries/System.Text.RegularExpressions/Perf.Regex.Industry.cs#L86 on my machine:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, sorry for the delay.
As one of its many ways of finding the next possible match starting location, Regex recognizes a string known to start the expression and uses IndexOf to find it. With this change, it can also do so for OrdinalIgnoreCase. With improvements to IndexOf(..., OrdinalIgnoreCase), this now yields significantly faster searches through longer inputs, in addition to leading to simpler code in source generated regexes.
With #85437, here's the benchmark https://github.com/dotnet/performance/blob/6dccc9979e9a99ebabee2a9b8b9e657c08c3f4a0/src/benchmarks/micro/libraries/System.Text.RegularExpressions/Perf.Regex.Industry.cs#L86 on my machine:
Note that without #85437, this PR will result in some usage being slower, as the compiler / source generator is already doing the same approach as IndexOf(..., OrdinalIgnoreCase) does today of searching for a set of characters with IndexOfAny, but it's frequently picking a better set of characters to search for based on frequency analysis. So we shouldn't merge this without the other PR (though this does have other benefits, like simpler codegen).