Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock when running commands that spawn background processes #310

Closed
flupke opened this issue Apr 7, 2016 · 6 comments
Closed

Deadlock when running commands that spawn background processes #310

flupke opened this issue Apr 7, 2016 · 6 comments
Labels

Comments

@flupke
Copy link

flupke commented Apr 7, 2016

The issue is described in details here: http://bugs.python.org/issue4216#msg77582

The issue is visible for example with git over ssh, when using ssh's ControlMaster option. Basically, it hangs because sh waits to receive an EOF on git's stream, which never happens because ssh inherited git's streams and survives past git's end.

Here's how to reproduce with sh:

import sh
import sys

# Remove control socket so we're sure to create a new ssh process; if you don't
# do this the script will hang the first time it is run, but not the second
# time
sh.rm('-f', '/tmp/git@github.com:22')
# Clone a repository over SSH, this will hang
sh.rm('-rf', 'sh')
sh.git.clone('git@github.com:amoffat/sh.git', _out=sys.stdout, _err=sys.stderr)

You also need to enable persistent ssh connections in your ~/.ssh/config:

Host *
ControlMaster auto
ControlPath /tmp/%r@%h:%p
ControlPersist 72h

I think there should be an option to completely disable sh reading on streams (_no_out and _no_err don't help, neither does redirecting streams to files).

@amoffat
Copy link
Owner

amoffat commented Apr 7, 2016

Thanks for reporting @flupke, I will look into it

@jhrmnn
Copy link

jhrmnn commented May 13, 2016

I hit this issue as well. A dirty workaround is to call something like subprocess.call(['ssh', host, 'true']) at the beginning of your script, which initiates the ssh connection and doesn't block since it doesn't handle streams at all. Subsequent calls to sh then work as expected. This of course assumes that the connection won't close while your script is running.

amoffat pushed a commit that referenced this issue Oct 7, 2016
provide a "force done" event that causes the output threads to stop attempting
to read from stdout/stderr in the cases where they would never finish.
closes #310
@amoffat
Copy link
Owner

amoffat commented Oct 7, 2016

So after some investigation, the way I see it is:

  • any process launched through sh has to wait on its stdout/stderr
  • any process may launch a subprocess, dup its stdout/stderr, causing the parent process to never finish reading

We could add an option saying "hey, I know this process will launch a subprocess and cause stdout/stderr to never finish, so disable reading on them." but that seems like it requires too much knowledge of what the process does under the hood, and will probably result in a lot of people creating issues, wondering about hanging, and then people try to debug it.

I've opted for a hacky approach (here and here) that requires no knowledge of the process. What it does is, when the process ends, we say "ok, threads that are spinning and trying to read from stdout and stderr, you have X seconds to finish reading from those streams, then you're done." Most processes stdout/err will receive their EOFs immediately, so the timer behavior won't apply to them. The timer behavior will only kick in on processes that have dup'd their fds in another child process, and will therefore never EOF. In that case, the process will hang for half a second on completion, but then terminate correctly

the fix is on the release-1.2 branch and will go out with it

@flupke
Copy link
Author

flupke commented Oct 7, 2016

Cool! I tested the branch against my script above, and it doesn't hang anymore. There are error messages though:

$ python test-sh-bug.py                                                                                                                                                                        (10-07 20:59)
Exception in thread Thread-5:
Traceback (most recent call last):
  File "/usr/local/Cellar/python/2.7.12/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/local/Cellar/python/2.7.12/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/Users/flupke/.virtualenvs/sh-bug/lib/python2.7/site-packages/sh.py", line 1684, in output_thread
    readers.remove(stream)
ValueError: list.remove(x): x not in list

Cloning into 'sh'...
Exception in thread Thread-8:
Traceback (most recent call last):
  File "/usr/local/Cellar/python/2.7.12/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/local/Cellar/python/2.7.12/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/Users/flupke/.virtualenvs/sh-bug/lib/python2.7/site-packages/sh.py", line 1684, in output_thread
    readers.remove(stream)
ValueError: list.remove(x): x not in list

@amoffat
Copy link
Owner

amoffat commented Oct 7, 2016

@flupke Weird, works fine for me, but I went ahead and reverted the lines that are causing the errors. Want to give it another try?

@flupke
Copy link
Author

flupke commented Oct 7, 2016

Yup, all good! (this was on mac btw)

On Fri, Oct 7, 2016 at 9:07 PM Andrew Moffat notifications@github.com
wrote:

@flupke https://github.com/flupke Weird, works fine for me, but I went
ahead and reverted the lines that are causing the errors. Want to give it
another try?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#310 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AALiIlIyMPxBdZqnFmGdpaCE8kwkyxslks5qxphhgaJpZM4ICFhV
.

0-wiz-0 added a commit to NetBSD/pkgsrc-wip that referenced this issue Dec 12, 2016
*   added `_out` and `_out_bufsize` validator [#346](amoffat/sh#346)
*   bugfix for internal stdout thread running when it shouldn't [#346](amoffat/sh#346)

*   regression bugfix on timeout [#344](amoffat/sh#344)
*   regression bugfix on `_ok_code=None`

*   further improvements on cpu usage

*   regression in cpu usage [#339](amoffat/sh#339)

*   fd leak regression and fix for flawed fd leak detection test [#337](amoffat/sh#337)

*   support for `io.StringIO` in python2

*   added support for using raw file descriptors for `_in`, `_out`, and `_err`
*   removed `.close()`ing `_out` handler if FIFO detected

*   composed commands no longer propagate `_bg`
*   better support for using `sys.stdin` and `sys.stdout` for `_in` and `_out`
*   bugfix where `which()` would not stop searching at the first valid executable found in PATH
*   added `_long_prefix` for programs whose long arguments start with something other than `--` [#278](amoffat/sh#278)
*   added `_log_msg` for advanced configuration of log message [#311](amoffat/sh#311)
*   added `sh.contrib.sudo`
*   added `_arg_preprocess` for advanced command wrapping
*   alter callable `_in` arguments to signify completion with falsy chunk
*   bugfix where pipes passed into `_out` or `_err` were not flushed on process end [#252](amoffat/sh#252)
*   deprecated `with sh.args(**kwargs)` in favor of `sh2 = sh(**kwargs)`
*   made `sh.pushd` thread safe
*   added `.kill_group()` and `.signal_group()` methods for better process control [#237](amoffat/sh#237)
*   added `new_session` special keyword argument for controlling spawned process session [#266](amoffat/sh#266)
*   bugfix better handling for EINTR on system calls [#292](amoffat/sh#292)
*   bugfix where with-contexts were not threadsafe [#247](amoffat/sh#195)
*   `_uid` new special keyword param for specifying the user id of the process [#133](amoffat/sh#133)
*   bugfix where exceptions were swallowed by processes that weren't waited on [#309](amoffat/sh#309)
*   bugfix where processes that dupd their stdout/stderr to a long running child process would cause sh to hang [#310](amoffat/sh#310)
*   improved logging output [#323](amoffat/sh#323)
*   bugfix for python3+ where binary data was passed into a process's stdin [#325](amoffat/sh#325)
*   Introduced execution contexts which allow baking of common special keyword arguments into all commands [#269](amoffat/sh#269)
*   `Command` and `which` now can take an optional `paths` parameter which specifies the search paths [#226](amoffat/sh#226)
*   `_preexec_fn` option for executing a function after the child process forks but before it execs [#260](amoffat/sh#260)
*   `_fg` reintroduced, with limited functionality.  hurrah! [#92](amoffat/sh#92)
*   bugfix where a command would block if passed a fd for stdin that wasn't yet ready to read [#253](amoffat/sh#253)
*   `_long_sep` can now take `None` which splits the long form arguments into individual arguments [#258](amoffat/sh#258)
*   making `_piped` perform "direct" piping by default (linking fds together).  this fixes memory problems [#270](amoffat/sh#270)
*   bugfix where calling `next()` on an iterable process that has raised `StopIteration`, hangs [#273](amoffat/sh#273)
*   `sh.cd` called with no arguments no changes into the user's home directory, like native `cd` [#275](amoffat/sh#275)
*   `sh.glob` removed entirely.  the rationale is correctness over hand-holding. [#279](amoffat/sh#279)
*   added `_truncate_exc`, defaulting to `True`, which tells our exceptions to truncate output.
*   bugfix for exceptions whose messages contained unicode
*   `_done` callback no longer assumes you want your command put in the background.
*   `_done` callback is now called asynchronously in a separate thread.
*   `_done` callback is called regardless of exception, which is necessary in order to release held resources, for example a process pool
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants