Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

availability of \G in syntax regex #57831

Closed
msftrncs opened this issue Sep 4, 2018 · 4 comments
Closed

availability of \G in syntax regex #57831

msftrncs opened this issue Sep 4, 2018 · 4 comments
Assignees
Labels
*dev-question VS Code Extension Development Question grammar Syntax highlighting grammar

Comments

@msftrncs
Copy link

msftrncs commented Sep 4, 2018

Version: 1.26.1
Commit: 493869e
Date: 2018-08-16T18:38:57.434Z
Electron: 2.0.5
Chrome: 61.0.3163.100
Node.js: 8.9.3
V8: 6.1.534.41
Architecture: x64

While trying to deal with a Grammar issue for PowerShell, I tried to apply a regex solution that would seem to work, but doesn't. The regex works in .Net (ie PowerShell itself), but not in VS Code (I know they are not the same), though I have had success with the same regex pattern before, but under a slightly different context.

I need to be able to differentiate between:

$variable-split "`n"
some-file-named-split

I am trying to catch the -split operator after the variable, without a space, but yet not lock on to the -split portion of a command name or file name.

The following regex gives an example, it matches both a variable and the -split operator, or neither of them, when used in PowerShell. This is just a demonstration, the actual syntax process is much more complicated. Note the negative look behind of \w OR anchor from end of last match (\G).

PS C:\> $match = "(\$[a-zA-Z][a-zA-Z?_]*)|((?i:(?<!\w)|\G)-split\b)"
PS C:\> [regex]::matches("`$hello-split", $match)
Groups   : {0, 1, 2}
Success  : True
Name     : 0
Captures : {0}
Index    : 0
Length   : 6
Value    : $hello

Groups   : {0, 1, 2}
Success  : True
Name     : 0
Captures : {0}
Index    : 6
Length   : 6
Value    : -split

PS C:\> [regex]::matches("hello-split", $match)
(no matches returned)

I know VS Code supports \G, as I have used it in a repository item that was included in another pattern's content. There it successfully separated out a situation where once you start a line comment in PowerShell, you may use additional #s, then additional white space, but after that, you must use a period to start a comment based help keyword, but the same comment based help keyword could be used on a multi line comment (right after the start, but no extra #s), or on a new line in a multiline comment as well, as long as only white space appeared before the period, and its a single repository item, so a combination of (\G|^)\s*\. was able to work for all three conditions.

### .synopsis
<# .description
     .notes
#>

Even GitHub's parser doesn't get them all.

Here the difference is that the $variable is matched by its own repository item, and the -split operator is captured by its own item, both at the root level of the syntax. At this point, \G appears to do nothing, and causes no matches. Doing some testing, the only place it appears to cause a match is at the very beginning of the very first line of a file.

I can understand the difference between the two scenario's above (the comment based help keyword is an included item in both comment line and comment block, and variable and operator are both separate items included in the root of the syntax), but being able to tie a match to the end of a previous match shouldn't be restricted to just being included inside the previous match's scope.

My actual match string (just an edit of the original):

"match": "(?:(?<!\\w|!)|\\G)-(?i:join|split)(?!\\p{L})"

Unless I have overlooked some obvious solution to this problem, I'm sure it will be mentioned that this is a limitation of the regex engine used by VS Code's textmate syntax system. Hopefully that's not the consensus, because textmate is already limited enough.

My current PowerShell.tmLanguage.json file can be found in the wip_goal branch of msftrncs/PowerShell.tmLanguage. I've been working though the issues reported on the PowerShell/EditorSyntax repository, but have not yet started to generate the edits and pull requests against that repository.

@vscodebot vscodebot bot added editor editor-find Editor find operations labels Sep 4, 2018
@dbaeumer
Copy link
Member

dbaeumer commented Sep 4, 2018

Not sure who best answers this.

@dbaeumer dbaeumer added the grammar Syntax highlighting grammar label Sep 4, 2018
@msftrncs
Copy link
Author

So I thought maybe I had a way to accomplish this. By switching the 'match' for variables to a 'begin' and setting the 'end' to (?=.|$) and then 'include'ing the operators as a pattern, I could catch these, and infact, it does work, but not entirely. The operators are spread across several patterns in the operators repository item, and if you were to chain several together, well it only works for operators from the same pattern on a single line, sorta, pretty hard to describe.

$a-or $b-and $c-xor $e  #works
$a-or $b-ne $c-xor $e  #doesn't work
$a -or $b-ne $c -xor $e  #doesn't work either, -ne is from different group than -or/-xor
$a -or $b -ne $c -xor $e #works as normal
$a-eq $b-ne $c #also works (whether it makes sense)

guessing its something to do with the 'end' using a empty match … but that doesn't entirely make any sense either.

@alexdima
Copy link
Member

There are two special things:

  • \G matches the anchor position. The anchor position is set when entering a begin-end or a begin-while rule, right after the begin part has been consumed. You can follow the code here. Most of the time, \G is useful when using something like a lookahead to enter a begin-end rule and then targeting the beginning of the entered rule via \G or leaving the rule as soon as \G does not match anymore. Here is an example where \G is tested to not match anymore and then leave. You can search for \G usages in https://github.com/Microsoft/vscode-textmate for more examples

  • \A is like ^ for the entire file (matches only at the begging of the first line).

I'm sorry I can't really consult you in greater detail, I've written the interpreter for grammars but haven't actually written a complex grammar myself :).

@alexdima alexdima added *dev-question VS Code Extension Development Question and removed editor editor-find Editor find operations labels Sep 12, 2018
@vscodebot
Copy link

vscodebot bot commented Sep 12, 2018

We have a great developer community over on slack where extension authors help each other. This is a great place for you to ask questions and find support.

Happy Coding!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
*dev-question VS Code Extension Development Question grammar Syntax highlighting grammar
Projects
None yet
Development

No branches or pull requests

4 participants