Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a lock around running pip #21672

Closed
jdemeyer opened this issue Oct 9, 2016 · 73 comments
Closed

Add a lock around running pip #21672

jdemeyer opened this issue Oct 9, 2016 · 73 comments

Comments

@jdemeyer
Copy link

jdemeyer commented Oct 9, 2016

Just like setuptools (#13201), pkg_resources (vendored by pip or stand-alone) has race conditions when building in parallel.

First race condition:

[backports_abc-0.4.p0]   Successfully uninstalled backports-abc-0.4
[pytz-2016.4.p0] Traceback (most recent call last):
[pytz-2016.4.p0]   File "/home/patchbot/sage-patchbot/local/bin/pip", line 9, in <module>
[pytz-2016.4.p0]     load_entry_point('pip==8.1.2', 'console_scripts', 'pip')()
[pytz-2016.4.p0]   File "/home/patchbot/sage-patchbot/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 542, in load_entry_point
[pytz-2016.4.p0]     return get_distribution(dist).load_entry_point(group, name)
[pytz-2016.4.p0]   File "/home/patchbot/sage-patchbot/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 2569, in load_entry_point
[pytz-2016.4.p0]     return ep.load()
[pytz-2016.4.p0]   File "/home/patchbot/sage-patchbot/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 2229, in load
[pytz-2016.4.p0]     return self.resolve()
[pytz-2016.4.p0]   File "/home/patchbot/sage-patchbot/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 2235, in resolve
[pytz-2016.4.p0]     module = __import__(self.module_name, fromlist=['__name__'], level=0)
[pytz-2016.4.p0]   File "/home/patchbot/sage-patchbot/local/lib/python2.7/site-packages/pip-8.1.2-py2.7.egg/pip/__init__.py", line 14, in <module>
[pytz-2016.4.p0]     from pip.utils import get_installed_distributions, get_prog
[pytz-2016.4.p0]   File "/home/patchbot/sage-patchbot/local/lib/python2.7/site-packages/pip-8.1.2-py2.7.egg/pip/utils/__init__.py", line 27, in <module>
[pytz-2016.4.p0]     from pip._vendor import pkg_resources
[pytz-2016.4.p0]   File "/home/patchbot/sage-patchbot/local/lib/python2.7/site-packages/pip-8.1.2-py2.7.egg/pip/_vendor/pkg_resources/__init__.py", line 2927, in <module>
[pytz-2016.4.p0]     @_call_aside
[pytz-2016.4.p0]   File "/home/patchbot/sage-patchbot/local/lib/python2.7/site-packages/pip-8.1.2-py2.7.egg/pip/_vendor/pkg_resources/__init__.py", line 2913, in _call_aside
[pytz-2016.4.p0]     f(*args, **kwargs)
[pytz-2016.4.p0]   File "/home/patchbot/sage-patchbot/local/lib/python2.7/site-packages/pip-8.1.2-py2.7.egg/pip/_vendor/pkg_resources/__init__.py", line 2940, in _initialize_master_working_set
[pytz-2016.4.p0]     working_set = WorkingSet._build_master()
[pytz-2016.4.p0]   File "/home/patchbot/sage-patchbot/local/lib/python2.7/site-packages/pip-8.1.2-py2.7.egg/pip/_vendor/pkg_resources/__init__.py", line 626, in _build_master
[pytz-2016.4.p0]     ws = cls()
[pytz-2016.4.p0]   File "/home/patchbot/sage-patchbot/local/lib/python2.7/site-packages/pip-8.1.2-py2.7.egg/pip/_vendor/pkg_resources/__init__.py", line 619, in __init__
[pytz-2016.4.p0]     self.add_entry(entry)
[pytz-2016.4.p0]   File "/home/patchbot/sage-patchbot/local/lib/python2.7/site-packages/pip-8.1.2-py2.7.egg/pip/_vendor/pkg_resources/__init__.py", line 675, in add_entry
[pytz-2016.4.p0]     for dist in find_distributions(entry, True):
[pytz-2016.4.p0]   File "/home/patchbot/sage-patchbot/local/lib/python2.7/site-packages/pip-8.1.2-py2.7.egg/pip/_vendor/pkg_resources/__init__.py", line 1942, in find_eggs_in_zip
[pytz-2016.4.p0]     if metadata.has_metadata('PKG-INFO'):
[pytz-2016.4.p0]   File "/home/patchbot/sage-patchbot/local/lib/python2.7/site-packages/pip-8.1.2-py2.7.egg/pip/_vendor/pkg_resources/__init__.py", line 1463, in has_metadata
[pytz-2016.4.p0]     return self.egg_info and self._has(self._fn(self.egg_info, name))
[pytz-2016.4.p0]   File "/home/patchbot/sage-patchbot/local/lib/python2.7/site-packages/pip-8.1.2-py2.7.egg/pip/_vendor/pkg_resources/__init__.py", line 1824, in _has
[pytz-2016.4.p0]     return zip_path in self.zipinfo or zip_path in self._index()
[pytz-2016.4.p0]   File "/home/patchbot/sage-patchbot/local/lib/python2.7/site-packages/pip-8.1.2-py2.7.egg/pip/_vendor/pkg_resources/__init__.py", line 1704, in zipinfo
[pytz-2016.4.p0]     return self._zip_manifests.load(self.loader.archive)
[pytz-2016.4.p0]   File "/home/patchbot/sage-patchbot/local/lib/python2.7/site-packages/pip-8.1.2-py2.7.egg/pip/_vendor/pkg_resources/__init__.py", line 1644, in load
[pytz-2016.4.p0]     mtime = os.stat(path).st_mtime
[pytz-2016.4.p0] OSError: [Errno 2] No such file or directory: '/home/patchbot/sage-patchbot/local/lib/python2.7/site-packages/backports_abc-0.4-py2.7.egg'

The same error can also happen in the non-vendored pkg_resources package, which is part of setuptools.

Second race condition:

Some vendored packages inside pip conditionally import certain packages. One example from local/lib/python/site-packages/pip/_vendor/requests/certs.py:

try:
    from certifi import where
except ImportError:
    def where():
        """Return the preferred certificate bundle."""
        # vendored bundle inside Requests
        return os.path.join(os.path.dirname(__file__), 'cacert.pem')

This is bad because it can happen that we are (un)installing certify while running a different instance of pip.

Third race condition:
The file local/bin/pip contains the line

from pkg_resources import load_entry_point

It turns out that pkg_resources actually imports the packages zope and mpl_toolkit (from matplotlib) due to these packages having .pth files. This is again a race condition.

Note: it seems that these race conditions only happen while uninstalling a package.

CC: @embray

Component: build

Author: Jeroen Demeyer

Branch: 75c1fff

Reviewer: Erik Bray, Volker Braun

Issue created by migration from https://trac.sagemath.org/ticket/21672

@jdemeyer jdemeyer added this to the sage-7.4 milestone Oct 9, 2016
@tscrim
Copy link
Collaborator

tscrim commented Oct 9, 2016

comment:1

I might have run into something like this when doing an incremental build from 7.4.beta6 to 7.4.rc0. I was getting failures trying to build a bunch of pip packages (e.g. alabaster). Although I did a make distclean and the Sage build went without problems, so it might not be related.

@jdemeyer
Copy link
Author

comment:2

Replying to @tscrim:

I might have run into something like this when doing an incremental build from 7.4.beta6 to 7.4.rc0. I was getting failures trying to build a bunch of pip packages (e.g. alabaster).

The building of Python packages now involves uninstalling the packages first. See #21441.

Although I did a make distclean and the Sage build went without problems, so it might not be related.

Or it further proves my hypothesis that the problems happen only while uninstalling a package. If you do make distclean first, there is nothing to uninstall.

@jdemeyer

This comment has been minimized.

@embray
Copy link
Contributor

embray commented Oct 10, 2016

comment:4

I'm inclined to think we need some kind of lock around running pip, or at least when uninstalling. It was never really designed to be run in parallel like this. Previously this was achieved in a patch sage has for setuptools (which impacted locking around easy_install). With easy_install the situation was much more severe, and could cause broken installation. I don't think pip has many problems surrounding installation, but I can see how it could break like this during uninstallation.

Jeroen is right that this specific issue was egg-related. I don't know if there are other similar issues that could occur though even without eggs.

@jdemeyer
Copy link
Author

comment:5

Replying to @embray:

I'm inclined to think we need some kind of lock around running pip, or at least when uninstalling.

Or a

try:
    ...
except OSError:
    pass

in the right place. But then you have to understand the code better.

@embray
Copy link
Contributor

embray commented Oct 10, 2016

comment:6

I specifically meant without patching anything (i.e. something at the level of sage-pip-install). Though a patch that can submitted upstream would be good too.

@jdemeyer
Copy link
Author

@strogdon
Copy link

comment:8

Good, I'm glad this was caught. On an incremental build from 7.4.beta6 to 7.4.rc0 I guess I had to type make close to 10 times before Sage and the docs were built. I assumed perhaps a slow download connection interfered with parallel building. But this seems much more reasonable.


New commits:

f9e8acbFix race condition in pip while uninstalling an egg

@strogdon
Copy link

Commit: f9e8acb

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Oct 11, 2016

Changed commit from f9e8acb to 20f48e5

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Oct 11, 2016

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

20f48e5Fix race condition in pip while uninstalling an egg

@jdemeyer
Copy link
Author

comment:10

This patch seems to fix the issues within Sage. Erik, what do you think?

@jdemeyer
Copy link
Author

Author: Jeroen Demeyer

@jdemeyer

This comment has been minimized.

@jdemeyer
Copy link
Author

comment:13

Any idea how to interpret this traceback?

[python_openid-2.2.5.p0]   File "build/bdist.linux-x86_64/egg/pkg_resources/__init__.py", line 2913, in _call_aside
[python_openid-2.2.5.p0]   File "build/bdist.linux-x86_64/egg/pkg_resources/__init__.py", line 2940, in _initialize_master_working_set
[python_openid-2.2.5.p0]   File "build/bdist.linux-x86_64/egg/pkg_resources/__init__.py", line 626, in _build_master
[python_openid-2.2.5.p0]   File "build/bdist.linux-x86_64/egg/pkg_resources/__init__.py", line 619, in __init__
[python_openid-2.2.5.p0]   File "build/bdist.linux-x86_64/egg/pkg_resources/__init__.py", line 675, in add_entry
[python_openid-2.2.5.p0]   File "build/bdist.linux-x86_64/egg/pkg_resources/__init__.py", line 1942, in find_eggs_in_zip
[python_openid-2.2.5.p0]   File "build/bdist.linux-x86_64/egg/pkg_resources/__init__.py", line 1463, in has_metadata
[python_openid-2.2.5.p0]   File "build/bdist.linux-x86_64/egg/pkg_resources/__init__.py", line 1824, in _has
[python_openid-2.2.5.p0]   File "build/bdist.linux-x86_64/egg/pkg_resources/__init__.py", line 1704, in zipinfo
[python_openid-2.2.5.p0]   File "build/bdist.linux-x86_64/egg/pkg_resources/__init__.py", line 1644, in load
[python_openid-2.2.5.p0] OSError: [Errno 2] No such file or directory: '/usr/local/src/sage-config/local/lib/python2.7/site-packages/pathlib2-2.1.0-py2.7.egg'

Why does it refer to files in build/bdist.linux-x86_64/egg/pkg_resources? Where is that place?

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Oct 11, 2016

Changed commit from 20f48e5 to 3f1f64a

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Oct 11, 2016

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

3f1f64aFix race condition in pip while uninstalling an egg

@jdemeyer
Copy link
Author

comment:15

Answering my own question:

>>> import pkg_resources
>>> pkg_resources.__file__
'/usr/local/src/sage-config/local/lib/python2.7/site-packages/setuptools-24.0.2-py2.7.egg/pkg_resources/__init__.pyc'

So setuptools also has a copy of pkg_resources (inside an egg, so find doesn't find it and tracebacks are hard to interpret)

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Oct 11, 2016

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

3512026Fix race condition in pip while uninstalling an egg

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Oct 11, 2016

Changed commit from 3f1f64a to 3512026

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Oct 11, 2016

Changed commit from 3512026 to fcbcf7e

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Oct 11, 2016

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

fcbcf7eFix race condition in pip while uninstalling an egg

@jdemeyer
Copy link
Author

comment:18

OK, I needed a few iterations to get this right. But now it seems to work.

The way I tested this:

  1. Build sage-7.4.beta2 (the last version before Use pip to install Python dependencies #20218).

  2. Build sage-7.4.beta6 (the last version before Old installed version of Cython is used #21441).

  3. Build this branch with lots of processes.

@vbraun
Copy link
Member

vbraun commented Oct 12, 2016

Changed reviewer from Erik Bray to Erik Bray, Volker Braun

@embray
Copy link
Contributor

embray commented Oct 13, 2016

comment:46

Three things:

  1. If we want to implement the locking mechanism in Python (makes sense to me) might as
    well reimplement sage-pip-install in its entirety in Python, rather than add yet another script. This would further the (ideal, to me) goal of having fewer shell scripts, and would be less confusing.

  2. As implemented this is not at all portable, and of course portable file-locking is hard to get right. I propose that we vendor the fasteners module which already provides a more portable and well-tested file lock.

  3. I would suggest implementing a reader/writer-like lock to allow multiple pip installs to run simultaneously, and only prevent other pips from running during a pip uninstall.

Since I have experience with the fasteners module I can take that over if you want.

@vbraun
Copy link
Member

vbraun commented Oct 13, 2016

comment:47

Sounds good to me but would be nice to do on another ticket so that we can release 7.4

@embray
Copy link
Contributor

embray commented Oct 13, 2016

comment:48

Is there some rush? I can work on this now. And I imagine you're going to do another RC right?

@jdemeyer
Copy link
Author

comment:49

Replying to @embray:

  1. I would suggest implementing a reader/writer-like lock to allow multiple pip installs to run simultaneously, and only prevent other pips from running during a pip uninstall.

That's exactly what I did.

@embray
Copy link
Contributor

embray commented Oct 13, 2016

comment:50

Replying to @jdemeyer:

Replying to @embray:

  1. I would suggest implementing a reader/writer-like lock to allow multiple pip installs to run simultaneously, and only prevent other pips from running during a pip uninstall.

That's exactly what I did.

Yeah, I see that now. I think what I meant is if using fasteners there isn't one built in so you have to hand-roll it with two locks, but that's simple enough.

@jdemeyer
Copy link
Author

comment:51

Replying to @embray:

Three things:

  1. If we want to implement the locking mechanism in Python (makes sense to me) might as
    well reimplement sage-pip-install in its entirety in Python, rather than add yet another script.
  1. I don't see the problem with having an additional script. Even if you would implement sage-pip-install in Python, I would be in favour of having two scripts because you may want to run pip without sage-pip-install.

  2. Perhaps we might be able to push the locking of pip upstream, but that's not going to happen for sage-pip-install.

  3. Didn't you mention that you wanted to get rid of sage-pip-install anyway in the future?

This would further the (ideal, to me) goal of having fewer shell scripts

I don't think that having few scripts should be a goal by itself.

@jdemeyer
Copy link
Author

comment:52

Replying to @embray:

  1. As implemented this is not at all portable

Well, it is portable to most UNIX-like operating systems. I don't think that there is a system on which Sage runs which does not have flock(2). We use it for our patch to setuptools and nobody ever complained about that.

That being said, I am not against using fasteners, but I agree with Volker that it's something which can be done in the future.

@embray
Copy link
Contributor

embray commented Oct 13, 2016

comment:53

Since this has come up before with setuptools, what if instead of making this lock script pip-specific we just make it general? After all, there's virtually nothing in it specific to pip.

@jdemeyer
Copy link
Author

comment:54

Replying to @embray:

After all, there's virtually nothing in it specific to pip.

Well, there are still various mentions of pip in that file (including the lock filename). And what will you do with the call

load_entry_point('pip', 'console_scripts', 'pip')()

@embray
Copy link
Contributor

embray commented Oct 14, 2016

comment:55

Yeah but clearly none of that has to be hard-coded for pip. It could wrap any command.

@jdemeyer
Copy link
Author

comment:56

Replying to @embray:

Yeah but clearly none of that has to be hard-coded for pip. It could wrap any command.

Of course it could, at the expense of making the script and the calling of the script more complicated. I don't care much about this. If you feel like generalizing the script, go for it. Just not on this blocker ticket.

@jdemeyer
Copy link
Author

comment:57

Another relevant question: is this locking something that would have a chance to get accepted upstream (of course, not with this implementation and we would need to support Windows too)?

Since the race conditions seem to be mostly in pkg_resources (part of setuptools), I guess that would be the most natural place to implement the locking.

@embray
Copy link
Contributor

embray commented Oct 14, 2016

comment:58

I don't know. I leave it your hands.

@embray
Copy link
Contributor

embray commented Oct 14, 2016

comment:59

Replying to @jdemeyer:

Replying to @embray:

Yeah but clearly none of that has to be hard-coded for pip. It could wrap any command.

Of course it could, at the expense of making the script and the calling of the script more complicated. I don't care much about this. If you feel like generalizing the script, go for it. Just not on this blocker ticket.

I'm barely talking about anything more than making the first argument the script (where the "SHARED" thing really ought to be a simple flag like -s) the string that you hard-code as "pip".

@vbraun
Copy link
Member

vbraun commented Oct 15, 2016

Changed branch from u/jdemeyer/race_condition_in_pkg_resources to 75c1fff

@jhpalmieri
Copy link
Member

comment:61

It seems like this issue puts using pip to install packages at odds with building in parallel. So should we not use pip so much, or do we not care about the speed of building that much? (I'm not complaining about the fix here at all, but I'm wondering about its consequences.)

@jhpalmieri
Copy link
Member

Changed commit from 75c1fff to none

@jdemeyer
Copy link
Author

comment:62

Replying to @jhpalmieri:

or do we not care about the speed of building that much?

I do care but this problem really needed fixing. I honestly tried hard to fix it by different means (look at the comments on this ticket) but I did not manage. But yes, it is certainly a regression.

@jhpalmieri
Copy link
Member

comment:63

So should we consider changing the installation methods for certain packages which take a relatively long time to build, like scipy? What are the advantages to installing these via pip, and do they outweigh the added time? (I don't care as much about the many packages that take under 30 seconds to build; building with pip is fine with those.)

@jdemeyer
Copy link
Author

comment:64

Replying to @jhpalmieri:

So should we consider changing the installation methods for certain packages which take a relatively long time to build, like scipy? What are the advantages to installing these via pip, and do they outweigh the added time? (I don't care as much about the many packages that take under 30 seconds to build; building with pip is fine with those.)

I would rather focus on fixing the problem at the level of pip. I still think that option 1. of [comment:28] should be possible to implement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants