Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regex parser regression #102643

Closed
jacobzed opened this issue May 24, 2024 · 4 comments · Fixed by #103591
Closed

Regex parser regression #102643

jacobzed opened this issue May 24, 2024 · 4 comments · Fixed by #103591

Comments

@jacobzed
Copy link

jacobzed commented May 24, 2024

Description

The following regex doesn't match in .net 8. The same regex matches in framework 4.72.

Reproduction Steps

var re = new Regex(@"^(.+?) (says?),\s'(.+)'$", RegexOptions.RightToLeft);
var sample = "User says, 'adventure'";
var match = re.Match(sample);
Console.WriteLine(match.Success); // should return true like in 4.72

Expected behavior

Regex match success

Actual behavior

Regex match fails

Regression?

Yes, this works in Framework 4.72

Known Workarounds

Removing RegexOptions.RightToLeft, or putting "s?" outside capturing group:
var re = new Regex(@"^(.+?) (say)s?,\s'(.+)'$", RegexOptions.RightToLeft);

Configuration

No response

Other information

No response

@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label May 24, 2024
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions
See info in area-owners.md if you want to be subscribed.

@steveharter
Copy link
Member

Did verify the repro. .NET v6 returns true while v7 returns false, so the behavior changed in v7.

The RegexOptions.RightToLeft is causing this to fail.

@steveharter steveharter removed the untriaged New issue has not been triaged by the area owner label May 29, 2024
@steveharter steveharter added this to the 9.0.0 milestone May 29, 2024
@steveharter
Copy link
Member

cc @stephentoub

@stephentoub stephentoub self-assigned this Jun 16, 2024
@stephentoub
Copy link
Member

Looks like this was introduced as part of an optimization that shouldn't be applying to RTL. We're looking to see whether we can combine loops with a multi that comes after it (a multi follows a loop in the example pattern because RTL ends up reversing concatenations), but because of the RTL we're looking at the wrong end of the multi, so it ends up erroneously combining the s? with the s that begins "say".

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants