Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand RegexNode atomicity test to loops at the end of alternation branches #1769

Merged
merged 1 commit into from
Jan 17, 2020

Conversation

stephentoub
Copy link
Member

@stephentoub stephentoub commented Jan 15, 2020

Given an expression like (abcd*|efgh*)ijk, we will now see that if the i after the alternation doesn't match and we backtrack into the alternation, we won't need to backtrack into the d* or h*, as it can't give back anything that would allow the i to match.

Best reviewed with whitespace disabled: https://github.com/dotnet/runtime/pull/1769/files?w=1

Contributes to #1349
cc: @danmosemsft, @pgovind, @eerhardt, @ViktorHofer, @lpereira

…anches

Given an expression like `(abcd*|efgh*)ijk`, we will now see that if the `i` after the alternation doesn't match and we backtrack into the alternation, we won't need to backtrack into the `d*`, as it can't give back anything that would allow the `i` to match.
@stephentoub stephentoub added this to the 5.0 milestone Jan 15, 2020
}

return false;
}
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside, I noticed on line 832, this

                    // If this node is a one/notone/setloop, see if it overlaps with its successor in the concatenation.
                    // If it doesn't, then we can upgrade it to being a one/notone/setloopatomic.

maybe should be clarified to say

                    // If this node is a oneloop/notoneloop/setloop, see if it overlaps with its successor in the concatenation.
                    // If it doesn't, then we can upgrade it to being a oneloopatomic/notoneloopatomic/setloopatomic.

since one and notone are node types that are distinct to oneloop/notoneloop.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eh, it's everywhere eg

        /// Simple optimization. If an atomic subexpression contains only a one/notone/set loop,
        /// change it to be an atomic one/notone/set loop and remove the atomic node.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I can see how it might not be clear; in my head I see the grouping, e.g. as if it were {one/notone/set}loop, but I can clean it up. I'll do so in my next PR.

// skip Groups, as they should have already been reduced away.
// If there's a concatenation, we can jump to the last element of it.
while (node.Type == Capture || node.Type == Concatenate)
static void ProcessNode(RegexNode node, RegexNode subsequent, int maxDepth = 20)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, might be better to pass in 20 above rather than default it here, so any future calls don't forget to decide what to pass in.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for CanBeMadeAtomic

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I'll address that subsequently.

@stephentoub stephentoub merged commit 0a755c1 into dotnet:master Jan 17, 2020
@stephentoub stephentoub deleted the alternationloopendings branch January 17, 2020 02:20
@ghost ghost locked as resolved and limited conversation to collaborators Dec 11, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants