Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipe character issues on Windows. #752

Open
KelSolaar opened this issue Sep 30, 2019 · 8 comments
Open

Pipe character issues on Windows. #752

KelSolaar opened this issue Sep 30, 2019 · 8 comments
Labels
bug os:windows Windows-specific shell Shell related issues

Comments

@KelSolaar
Copy link
Contributor

Hi,

I'm testing Rez 2.47.2 and encountering issues with pipe character handling, this time and for a change, a picture:

image

This is quite a problematic one, we are passing regex patterns as arguments often and it just breaks now.

I can escape them like this to some degree:

C:\Users\thomas>rez-env cmd -- echo "Foo^|Bar"
Foo|Bar

But I could not get it to work with more complex invocations unfortunately. Assuming a my-command.bat file which does echo its arguments, i.e. echo %*, in the current working directory:

C:\Users\thomas>rez-env cmd -- my-command --argument-a --argument-b "C:\Users\thomas\My Directory" --argument-c "\bFoo\b|Bar-Plop|John-Plop|Doe-Plop"
'Bar-Plop' is not recognized as an internal or external command,
operable program or batch file.

C:\Users\thomas>rez-env cmd -- my-command --argument-a --argument-b "C:\Users\thomas\My Directory" --argument-c "\bFoo\b^|Bar-Plop^|John-Plop^|Doe-Plop"
'Bar-Plop' is not recognized as an internal or external command,
operable program or batch file.

I could change echo %* to echo "%*" and it would work but if echo is instead a Python program which expects arguments it will break because of the double quotes as it will receive the whole line as a single argument instead of the granular arguments.

@KelSolaar
Copy link
Contributor Author

So it seems like we are plagued by yet again another quotes swallowing issue, here is the content of the rez-shell.bat file:

set REZ_ENV_PROMPT=%REZ_ENV_PROMPT%$G
echo %PATHEXT%|C:\Windows\System32\findstr.exe /i /c:".PY">nul || set PATHEXT=%PATHEXT%;.PY
(call )
call c:\users\thomas\appdata\local\temp\rez_context_twp0oy\context.bat
set PROMPT=%REZ_ENV_PROMPT% $P$G
my-command --argument-a --argument-b "C:\Users\thomas\My Directory" --argument-c \bFoo\b|Bar-Plop|John-Plop|Doe-Plop
exit %errorlevel%

Note how the quotes have disappeared around the regex pattern?

Changing to my-command --argument-a --argument-b "C:\Users\thomas\My Directory" --argument-c "\bFoo\b|Bar-Plop|John-Plop|Doe-Plop" fixes the issue.

@KelSolaar
Copy link
Contributor Author

KelSolaar commented Sep 30, 2019

A bit of low-key debugging for a few definitions:

C:\Users\thomas>rez-env cmd -- my-command --argument-a --argument-b "C:\Users\thomas\My Directory" --argument-c "\bFoo\b|Bar-Plop|John-Plop|Doe-Plop"
  • _main.run, sys.argv: ['C:\\Program Files\\rez\\Scripts\\rez\\rez-env', 'cmd', '--', 'my-command', '--argument-a', '--argument-b', 'C:\\Users\\thomas\\My Directory', '--argument-c', '\\bFoo\\b|Bar-Plop|John-Plop|Doe-Plop']
  • env.command: ['my-command', '--argument-a', '--argument-b', 'C:\\Users\\thomas\\My Directory', '--argument-c', '\\bFoo\\b|Bar-Plop|John-Plop|Doe-Plop']
  • CMD.join: ['my-command', '--argument-a', '--argument-b', 'C:\\Users\\thomas\\My Directory', '--argument-c', '\\bFoo\\b|Bar-Plop|John-Plop|Doe-Plop'])
  • ResolvedContext.execute_shell: my-command --argument-a --argument-b "C:\Users\thomas\My Directory" --argument-c \bFoo\b|Bar-Plop|John-Plop|Doe-Plop
  • Shell.spawn_shell: 'my-command --argument-a --argument-b "C:\\Users\\thomas\\My Directory" --argument-c \\bFoo\\b|Bar-Plop|John-Plop|Doe-Plop')

A few notes:

  • sys.argv does not preserve the double-quotes, so they are lost immediately
  • The sys.argv list is joined with subprocess.list2cmdline and we get our quotes back on the argument with path because there was a space in it as per this line.
  • I don't know why quotes were preserved in Rez 2.16.0.
  • A terrible and shit way to fix the issue in my case is to introduce a space in the regex, for example: "\bFoo\b|Bar-Plop|John-Plop|Doe-Plop| " (or "\bFoo\b|Bar-Plop|John-Plop|Doe-Plop| \s " to make it obvious) and it will force subprocess.list2cmdline to put double-quotes around the argument.

@KelSolaar
Copy link
Contributor Author

Actually just escaping the double-quotes with \ is enough!

\"\bFoo\b|Bar-Plop|John-Plop|Doe-Plop\"

> C:\Users\thomas>python -c "import sys;print(sys.argv)" "hello"
['-c', 'hello']

> C:\Users\thomas>python -c "import sys;print(sys.argv)" \"hello\"
['-c', '"hello"']

I guess the issue is considered fixed but before closing I'm assuming the behaviour in 2.16.0 was incorrect then?

@bfloch
Copy link
Contributor

bfloch commented Nov 20, 2019

Sorry you had to go through this.
Thanks for you in-depth look into the issue. In my tests this problem also exists for PowerShell.

I did change shlex_join to list2cmdline because, if I remember correctly, one of the test would fail since shlex_join assumes unix shells and did not do a good job for Windows.
I looked up what they use internally in windows and list2cmdline seemed to be a good candidate.

However, as you point out correctly, it does not have enough context to do a good job.
Ideally we would want to retain the quotes but the shell does not pass this information down sys.argv.

So instead, like you discovered, we should roll our own function that does take care about special, shell dependent characters.

needs_quotes needs to trigger if any shell character is present. In your case it is |, but I wonder if we can get a set of all of them somewhere.
Best I can think of is <>|& based on https://ss64.com/nt/syntax-redirection.html

I hope this does not introduce other problems but my suggestion is we include list2cmdline internally (good since it is deprecated = internal API anyway) and extend it to our needs.

@bfloch
Copy link
Contributor

bfloch commented Nov 20, 2019

I just tested: At least the shell is smart enough to identify real redirection characters immediately. So my suggestion should be safe: Anything that has redirection characters at the point of our join parsing must actually be a string parameter and should be therefore quoted.

Fingers crossed.

@nerdvegas
Copy link
Contributor

nerdvegas commented Nov 20, 2019 via email

@bfloch
Copy link
Contributor

bfloch commented Nov 20, 2019

It already is in Shell.join :)
https://github.com/nerdvegas/rez/blob/master/src/rez/shells.py#L281

It seems that this kind of stuff is a mixture of os (process creation) and shell. So keeping them in utils and picking the right one based on the shell seems to be all the flexibility we need.

Haven't read this thoroughly but seems to be a good reference for windows:
https://blogs.msdn.microsoft.com/twistylittlepassagesallalike/2011/04/23/everyone-quotes-command-line-arguments-the-wrong-way/

@nerdvegas
Copy link
Contributor

nerdvegas commented Nov 21, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug os:windows Windows-specific shell Shell related issues
Projects
None yet
Development

No branches or pull requests

4 participants