Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regex Capture gives IndexOutOfRangeException on (?:){93} #62094

Closed
danmoseley opened this issue Nov 27, 2021 · 16 comments · Fixed by #97916
Closed

Regex Capture gives IndexOutOfRangeException on (?:){93} #62094

danmoseley opened this issue Nov 27, 2021 · 16 comments · Fixed by #97916
Labels
area-System.Text.RegularExpressions bug disabled-test The test is disabled in source code against the issue help wanted [up-for-grabs] Good issue for external contributors in-pr There is an active PR which will close this issue when it is merged
Milestone

Comments

@danmoseley
Copy link
Member

danmoseley commented Nov 27, 2021

Regex regex = new("@(a*)+?");
MatchCollection mc = regex.Matches("@");

foreach (Match m in mc)
{
    Console.WriteLine($"{m.Index} .. {m.Length + m.Index}");
    Console.WriteLine(m.ToString());
}

On other regex engines, like PCRE, Nim, Python, Javascript, and Golang you would get 0 .. 1

On .NET Framework this displays 1 .. 1

On .NET 6 it gives

1 .. 3
Unhandled exception. System.ArgumentOutOfRangeException: Index and length must refer to a location within the string. (Parameter 'length')
   at System.String.Substring(Int32 startIndex, Int32 length)
   at System.Text.RegularExpressions.Capture.get_Value()
   at System.Text.RegularExpressions.Capture.ToString()
   at Program.<Main>$(String[] args) in C:\Users\danmose\source\repos\ConsoleApp31\ConsoleApp31\Program.cs:line 191

@stephentoub first bug from running Nim tests.

@dotnet-issue-labeler dotnet-issue-labeler bot added area-System.Text.RegularExpressions untriaged New issue has not been triaged by the area owner labels Nov 27, 2021
@ghost
Copy link

ghost commented Nov 27, 2021

Tagging subscribers to this area: @eerhardt, @dotnet/area-system-text-regularexpressions
See info in area-owners.md if you want to be subscribed.

Issue Details
Regex regex = new("@(a*)+?");
MatchCollection mc = regex.Matches("@");

foreach (Match m in mc)
{
    Console.WriteLine($"{m.Index} .. {m.Length + m.Index - 1}");
    Console.WriteLine(m.ToString());
}

On other regex engines, like PCRE, Nim, Python, Javascript, and Golang you would get 0 .. 1

On .NET Framework this displays 1 .. 0

On .NET 6 it gives

1 .. 2
Unhandled exception. System.ArgumentOutOfRangeException: Index and length must refer to a location within the string. (Parameter 'length')
   at System.String.Substring(Int32 startIndex, Int32 length)
   at System.Text.RegularExpressions.Capture.get_Value()
   at System.Text.RegularExpressions.Capture.ToString()
   at Program.<Main>$(String[] args) in C:\Users\danmose\source\repos\ConsoleApp31\ConsoleApp31\Program.cs:line 191

@stephentoub first bug from running Nim tests.

Author: danmoseley
Assignees: -
Labels:

area-System.Text.RegularExpressions, untriaged

Milestone: -

@danmoseley danmoseley changed the title Regex Capture gives ArgumentOutOfRangeException for zero length capture Regex Capture gives ArgumentOutOfRangeException Nov 27, 2021
@danmoseley
Copy link
Member Author

danmoseley commented Nov 27, 2021

@(a?)+? with text of @will not crash, but gives 1 .. 1 whereas other engines will give 0 .. 1

So there seems to be distinct bugs

  • incorrect starting index of the match in all .NET versions
  • incorrect length in .NET Core, causing a crash.

@danmoseley danmoseley changed the title Regex Capture gives ArgumentOutOfRangeException Regex Capture gives ArgumentOutOfRangeException (regression) and incorrect index (not regression) in lazy quantifier around zero lower bound quantifier that does not match Nov 27, 2021
@stephentoub
Copy link
Member

incorrect length in .NET Core, causing a crash.

Looks like a regression in .NET 5, which also exhibits this. .NET Core 3.1 outputs 1..1 like .NET Framework does.

@stephentoub
Copy link
Member

This is actually an issue in .NET Framework 4.8 and .NET Core 3.1 as well.

The change in .NET 5 is that it recognizes that the outer loop is actually atomic (because there's nothing after it to backtrack into it). So it effectively changes the expression to instead be:

"@(?>(a*)+?)"

Try that expression with .NET Framework, and you similarly get 1..3 and an out of range exception.

@stephentoub
Copy link
Member

(Locally I also have the final changes that switches over all compiler / source generated expressions to using the new simpler code generation (except for anything that uses RightToLeft), and with that, this correctly outputs 0..1 and no exception, so it'll end up really only being an issue for the interpreter.)

@danmoseley
Copy link
Member Author

When I fuzzed I believe I only used IsMatch. Next time we should enumerate the matches also.

@joperezr joperezr added the bug label Jan 19, 2022
@joperezr joperezr added this to the 7.0.0 milestone Jan 19, 2022
@joperezr joperezr removed the untriaged New issue has not been triaged by the area owner label Jan 19, 2022
@joperezr
Copy link
Member

joperezr commented Jul 20, 2022

I just tested with Preview 7 bits and seems like all 4 engines (including Interpreted) are correctly printing:

0 .. 1
@

Not sure exactly which change fixed this as it was logged some time ago, but closing as completed.

@danmoseley
Copy link
Member Author

@joperezr might be worth adding as a test case if we don't have one.

@joperezr joperezr reopened this Jul 21, 2022
@joperezr
Copy link
Member

Good point, I'll reopen this to track that.

@joperezr
Copy link
Member

lol, that explains why after adding it I was seeing it running elsewhere in testResults.xml. I'll fix that condition instead then.

@joperezr
Copy link
Member

joperezr commented Jul 21, 2022

so I tried re-enabling all those 3, but looks like one is still problematic:

engine, @"(?:){93}", "x", RegexOptions.None, new[]
{
new CaptureData("", 0, 0),
new CaptureData("", 1, 0)
}

This is still throwing IndexOutOfRangeException when trying to call Trackpush(). Looks like with such a large number of loops (93 in that testcase) we are failing when trying to walk back through the runtrack stack, the exception is being thrown on this line:

since at some point runtrackpos is 0 so this is trying to access to index -1

@stephentoub
Copy link
Member

@joperezr, can this be closed again now?

Or if it's tracking a disabled test, should it be moved out of 7.0 and labeled as disabled test?

@joperezr
Copy link
Member

joperezr commented Aug 1, 2022

it is tracking a disabled test, so I'll label and move as suggested.

@joperezr joperezr added the disabled-test The test is disabled in source code against the issue label Aug 1, 2022
@joperezr joperezr modified the milestones: 7.0.0, Future Aug 1, 2022
@danmoseley danmoseley changed the title Regex Capture gives ArgumentOutOfRangeException (regression) and incorrect index (not regression) in lazy quantifier around zero lower bound quantifier that does not match Regex Capture gives IndexOutOfRangeException on (?:){93} Aug 21, 2022
@danmoseley danmoseley added the help wanted [up-for-grabs] Good issue for external contributors label Aug 21, 2022
@rhirano0715
Copy link
Contributor

I would like to provide a comment for reference.
When I tried changing the number of repetitions and testing it, the following results were obtained.

I insert the following into Regex.MultipleMatches.Tests.Matches_TestData() for testing.

                for (int i = 0; i < 100; i++)
                {
                    yield return new object[]
                    {
                        engine, @"(?:){" + i + @"}", "x", RegexOptions.None, new[]
                        {
                            new CaptureData("", 0, 0),
                            new CaptureData("", 1, 0)
                        }
                    };
                }

rhirano0715 added a commit to rhirano0715/dotnet_runtime that referenced this issue Feb 3, 2024
This update fixes the IndexOutOfRangeException in RegexInterpreter by enhancing the `TrackPush` and `TrackPush2` methods. The adjustment involves checking the runtrack position before decrementing it, ensuring that it doesn't become negative, which was the root cause of the exception. This prevents potential out-of-range errors when handling large numbers of repetitions in regular expressions.

Fix dotnet#62094
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Feb 3, 2024
@rhirano0715
Copy link
Contributor

The size of RegexRunner.runtrack is set to runtrackcount * 8 (runtrackcount == _code.TrackCount).
The value of runtrackcount was as follows:

  • For @"(?:){0}" ~ @"(?:){1}": 3
  • For @"(?:){2}" ~ @"(?:){99}": 7

It appears that the reason the index (runtrackpos) becomes negative and causes an exception from @"(?:){26}" ~ @"(?:){99}" is because the runtrack size of 7 * 8 (== 56) is not sufficient.

stephentoub pushed a commit that referenced this issue Apr 13, 2024
* Prevent IndexOutOfRangeException in RegexInterpreter

This update fixes the IndexOutOfRangeException in RegexInterpreter by enhancing the `TrackPush` and `TrackPush2` methods. The adjustment involves checking the runtrack position before decrementing it, ensuring that it doesn't become negative, which was the root cause of the exception. This prevents potential out-of-range errors when handling large numbers of repetitions in regular expressions.

Fix #62094

* Changed to call EnsureStorage() unconditionally.

If EnsureStorage() is called unconditionally, the array will be expanded, so the position will never become negative. When the conditions inside EnsureStorage() are true, it might be necessary to expand the array, regardless of the comparison between newpos and codepos.

https://github.com/dotnet/runtime/blob/6ebc8bd86dbc780b2a2a7daf3ab6020f9104f09e/src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/Regex.MultipleMatches.Tests.cs#L461-L469

Before the change, in this test case, EnsureStorage() is not called because newpos == codepos == 6 from the first time until an exception occurs.

Fix #62049
matouskozak pushed a commit to matouskozak/runtime that referenced this issue Apr 30, 2024
* Prevent IndexOutOfRangeException in RegexInterpreter

This update fixes the IndexOutOfRangeException in RegexInterpreter by enhancing the `TrackPush` and `TrackPush2` methods. The adjustment involves checking the runtrack position before decrementing it, ensuring that it doesn't become negative, which was the root cause of the exception. This prevents potential out-of-range errors when handling large numbers of repetitions in regular expressions.

Fix dotnet#62094

* Changed to call EnsureStorage() unconditionally.

If EnsureStorage() is called unconditionally, the array will be expanded, so the position will never become negative. When the conditions inside EnsureStorage() are true, it might be necessary to expand the array, regardless of the comparison between newpos and codepos.

https://github.com/dotnet/runtime/blob/6ebc8bd86dbc780b2a2a7daf3ab6020f9104f09e/src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/Regex.MultipleMatches.Tests.cs#L461-L469

Before the change, in this test case, EnsureStorage() is not called because newpos == codepos == 6 from the first time until an exception occurs.

Fix dotnet#62049
@github-actions github-actions bot locked and limited conversation to collaborators May 13, 2024
Ruihan-Yin pushed a commit to Ruihan-Yin/runtime that referenced this issue May 30, 2024
* Prevent IndexOutOfRangeException in RegexInterpreter

This update fixes the IndexOutOfRangeException in RegexInterpreter by enhancing the `TrackPush` and `TrackPush2` methods. The adjustment involves checking the runtrack position before decrementing it, ensuring that it doesn't become negative, which was the root cause of the exception. This prevents potential out-of-range errors when handling large numbers of repetitions in regular expressions.

Fix dotnet#62094

* Changed to call EnsureStorage() unconditionally.

If EnsureStorage() is called unconditionally, the array will be expanded, so the position will never become negative. When the conditions inside EnsureStorage() are true, it might be necessary to expand the array, regardless of the comparison between newpos and codepos.

https://github.com/dotnet/runtime/blob/6ebc8bd86dbc780b2a2a7daf3ab6020f9104f09e/src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/Regex.MultipleMatches.Tests.cs#L461-L469

Before the change, in this test case, EnsureStorage() is not called because newpos == codepos == 6 from the first time until an exception occurs.

Fix dotnet#62049
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Text.RegularExpressions bug disabled-test The test is disabled in source code against the issue help wanted [up-for-grabs] Good issue for external contributors in-pr There is an active PR which will close this issue when it is merged
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants