Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove DESTDIR staging for Python packages to eliminate race conditions during Python package installations #29585

Closed
mkoeppe opened this issue Apr 26, 2020 · 45 comments

Comments

@mkoeppe
Copy link
Contributor

mkoeppe commented Apr 26, 2020

Follow-up from #26018:

Declaring some Python packages as PYTHON_TOOLCHAIN helped somewhat, but there are more race conditions with the same failure mechanism: Some Python package byte-compiles modules from a half-installed package, and then finishing the installation of that fails.

Of course, the high parallelization in my automatic runs on GitHub provoke these kinds of errors. But this should be fixed nevertheless.

See https://github.com/mkoeppe/sage/runs/618645992

  [backports_shutil_get_terminal_size-1.0.0.p1]   cp: cannot create regular file '/sage/local/./lib/python2.7/site-packages/backports/__init__.py': File exists
  [backports_shutil_get_terminal_size-1.0.0.p1]   ************************************************************************
  [backports_shutil_get_terminal_size-1.0.0.p1]   Error copying files for backports_shutil_get_terminal_size-1.0.0.p1.
  [backports_shutil_get_terminal_size-1.0.0.p1]   ************************************************************************
  [backports_ssl_match_hostname-3.5.0.1.p0] error installing, exit status 1. End of log file:
  [backports_shutil_get_terminal_size-1.0.0.p1] Full log file: /sage/logs/pkgs/backports_shutil_get_terminal_size-1.0.0.p1.log


  [backports_ssl_match_hostname-3.5.0.1.p0]   Copying package files from temporary location /sage/local/var/tmp/sage/build/backports_ssl_match_hostname-3.5.0.1.p0/inst to /sage/local
  [backports_ssl_match_hostname-3.5.0.1.p0]   cp: cannot create regular file '/sage/local/./lib/python2.7/site-packages/backports/__init__.pyc': File exists
  [backports_ssl_match_hostname-3.5.0.1.p0]   ************************************************************************
  [backports_ssl_match_hostname-3.5.0.1.p0]   Error copying files for backports_ssl_match_hostname-3.5.0.1.p0.
  [backports_ssl_match_hostname-3.5.0.1.p0]   ************************************************************************
  [backports_ssl_match_hostname-3.5.0.1.p0] Full log file: /sage/logs/pkgs/backports_ssl_match_hostname-3.5.0.1.p0.log

Another occurrence on Cygwin: https://github.com/mkoeppe/sage/runs/638920131?check_suite_focus=true

Another occurrence: #30831

A new one thanks to recent pip's opportunistic use of tornado:

  [tornado-6.0.4]   real	0m11.342s
  [tornado-6.0.4]   user	0m2.041s
  [tornado-6.0.4]   sys	0m0.380s
  [tornado-6.0.4]   Copying package files from temporary location /sage/local/var/tmp/sage/build/tornado-6.0.4/inst to /sage/local
  [tornado-6.0.4]   cp: cannot create directory '/sage/local/./lib/python3.9/site-packages/tornado/__pycache__': File exists
  [tornado-6.0.4]   ************************************************************************

https://github.com/sagemath/sage/runs/2879682001

In this ticket, we remove the DESTDIR staging for Python packages. It has been redundant since we moved to installing packages via wheels.

After this change, the only file that is tracked in $SAGE_LOCAL/var/lib/sage/installed/pynormaliz-2.14 is the wheel file:

    "files": [
        "var/lib/sage/wheels/PyNormaliz-2.14-cp39-cp39-macosx_11_0_x86_64.whl"
    ]
}

Uninstalling Python packages is now done via pip, by means of a new removal script called spkg-piprm installed in $SAGE_LOCAL/var/lib/sage/scripts/SPKG/.

CC: @jhpalmieri @dimpase @embray @kliem

Component: build

Keywords: sd111

Author: Matthias Koeppe

Branch: c3eeb69

Reviewer: John Palmieri

Issue created by migration from https://trac.sagemath.org/ticket/29585

@mkoeppe mkoeppe added this to the sage-9.1 milestone Apr 26, 2020
@mkoeppe
Copy link
Contributor Author

mkoeppe commented Apr 26, 2020

comment:1

Various fixes are possible here:

  • More dependencies need to be added
  • Copying from the staging dir needs to make Python package directory creation atomic (by copying to a temp name in the destination directory, followed by an (atomic) mv of the directory)
  • It should be checked if current versions of pip are safe for concurrent use so we don't have to reinvent the wheel

@jhpalmieri
Copy link
Member

comment:2

In general, I think that package installation can't be atomic, since we need to move things to SAGE_LOCAL/lib and SAGE_LOCAL/include, and we can't just create a directory temp/lib and move it to SAGE_LOCAL without overwriting the existing one. But maybe this could work with Python packages. Rewrite sdh_pip_install?

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Apr 26, 2020

comment:3

Yes, I mean specifically for Python packages.

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Apr 26, 2020

comment:4

Replying to @jhpalmieri:

Rewrite sdh_pip_install?

I think the copy-from-staging-dir code could just deal with the Python directories specially. We are working around another problem (lib64) with that code already anyway.

This would solve it even for Python packages that are not installed using pip currently (#29500).

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Apr 26, 2020

comment:5

In fact (when SAGE_SUDO is unset), the whole code should actually recursively move, instead of copy, directories. This will be much faster too (the staged directory is already on the same filesystem)

@jhpalmieri
Copy link
Member

comment:6

mv vs cp was discussed at #26011.

Something like the following (rough draft, untested)?

diff --git a/build/bin/sage-spkg b/build/bin/sage-spkg
index 5cdd55db3f..0f8acf9b97 100755
--- a/build/bin/sage-spkg
+++ b/build/bin/sage-spkg
@@ -946,6 +946,12 @@ if [ -d "$SAGE_DESTDIR" ]; then
     # Generate installed file manifest
     FILE_LIST="$(cd "$PREFIX" && find . -type f -o -type l | sed 's|^\./||' | sort)"
 
+    if [ -d "$PREFIX"/lib/python*/site-packages/ ]; then
+        echo "This is a Python package: moving site-package directories atomically."
+        SITE_PACKAGES=$(python -c 'import site; print(site.getsitepackages()[0])')
+        $SAGE_SUDO mv "$PREFIX"/lib/python*/site-packages/* "$SITE_PACKAGES"
+    fi
+
     # Copy files into $SAGE_LOCAL
     $SAGE_SUDO cp -Rp "$PREFIX/." "$SAGE_LOCAL"
     if [ $? -ne 0 ]; then

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Apr 26, 2020

comment:7

Replying to @jhpalmieri:

mv vs cp was discussed at #26011.

Thanks for the pointer. The mv --recursive that Erik is asking about in that thread, is exactly what would need to be implemented - best in Python (as also discussed in that thread).

Something like the following (rough draft, untested)?

diff --git a/build/bin/sage-spkg b/build/bin/sage-spkg
index 5cdd55db3f..0f8acf9b97 100755
--- a/build/bin/sage-spkg
+++ b/build/bin/sage-spkg
@@ -946,6 +946,12 @@ if [ -d "$SAGE_DESTDIR" ]; then
     # Generate installed file manifest
     FILE_LIST="$(cd "$PREFIX" && find . -type f -o -type l | sed 's|^\./||' | sort)"
 
+    if [ -d "$PREFIX"/lib/python*/site-packages/ ]; then
+        echo "This is a Python package: moving site-package directories atomically."
+        SITE_PACKAGES=$(python -c 'import site; print(site.getsitepackages()[0])')
+        $SAGE_SUDO mv "$PREFIX"/lib/python*/site-packages/* "$SITE_PACKAGES"
+    fi
+
     # Copy files into $SAGE_LOCAL
     $SAGE_SUDO cp -Rp "$PREFIX/." "$SAGE_LOCAL"
     if [ $? -ne 0 ]; then

That could work, but I think I would prefer the more general solution.

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Apr 26, 2020

comment:8

This could be done in some 20 lines of code I guess that would take care of getting the intended concurrency semantics right. Basically try to move a source directory to the target, handling EEXIST and in that case recursing instead.

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Apr 26, 2020

comment:9

The new script would also

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Apr 27, 2020

@jhpalmieri
Copy link
Member

comment:11

Ha, yes, I stumbled on that website, too.

@mkoeppe

This comment has been minimized.

@mkoeppe mkoeppe modified the milestones: sage-9.1, sage-9.2 May 5, 2020
@embray
Copy link
Contributor

embray commented Aug 31, 2020

comment:14

I think this sort of issue was solved a long time ago with sage-pip-install, and its use of sage-flock to prevent pip installing more than one package simultaneously.

Unfortunately, with the switch to the DESTDIR system, while this is probably still helpful, it doesn't help during copying of files into SAGE_LOCAL.

Perhaps the pip-lock should also be held when copying packages into $SAGE_LOCAL (don't even bother checking if it's an actual Python package; just prevent copying anything into $SAGE_LOCAL without holding the pip-lock).

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Aug 31, 2020

comment:15

In 9.3 I hope to switch installation of all Python packages from DESTDIR to use pip wheel - see #29500.

@mkoeppe mkoeppe modified the milestones: sage-9.2, sage-9.3 Oct 8, 2020
@mkoeppe

This comment has been minimized.

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Dec 6, 2020

comment:18

Hoping we can make progress on this ticket this week - https://wiki.sagemath.org/days111

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Dec 6, 2020

Changed keywords from none to sd111

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Feb 13, 2021

comment:19

Setting new milestone based on a cursory review of ticket status, priority, and last modification date.

@mkoeppe mkoeppe modified the milestones: sage-9.3, sage-9.4 Feb 13, 2021
@mkoeppe

This comment has been minimized.

@mkoeppe

This comment has been minimized.

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jun 23, 2021

Branch pushed to git repo; I updated commit sha1. New commits:

3b435d9build/bin/sage-dist-helpers (sdh_store_and_pip_install_wheel): Record name of installed distribution name
34bd445sdh_pip_uninstall (sdh_pip_uninstall): New helper functtion; use same flags in 'make SPKG-clean' for pip packages
49af8aabuild/bin/sage-spkg, build/sage_bootstrap/uninstall.py: Prepare/install/use the spkg-piprm script

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jun 23, 2021

Commit: 49af8aa

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Jun 23, 2021

Author: Matthias Koeppe

@mkoeppe

This comment has been minimized.

@mkoeppe mkoeppe changed the title More race conditions with Python package installations because of DESTDIR Remove DESTDIR staging for Python packages to eliminate race conditions during Python package installations Jun 23, 2021
@jhpalmieri
Copy link
Member

comment:27
  • It would be good to document sdh_pip_uninstall, both at the top of the file and in the developer's guide.

  • Unindent the block in sage-spkg starting on line 449? The block right after the comment "Extract the package" — you deleted the enclosing if and fi but left the block indented.

  • Typo: "pacakges" on line 538 of sage-spkg.

  • Was the variable USE_LOCAL_SCRIPTS essentially used to distinguish between old-stye and new-style packages?

The changes make sense, but it's hard for me to test them because I can't reproduce the race conditions. I've tried make -j20 on a pretty fast computer, but no luck so far.

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jul 8, 2021

Branch pushed to git repo; I updated commit sha1. New commits:

ecd49fbbuild/bin/sage-dist-helpers, src/doc/en/developer/packaging.rst: Document sdh_pip_uninstall

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jul 8, 2021

Changed commit from 49af8aa to ecd49fb

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jul 8, 2021

Changed commit from ecd49fb to c3eeb69

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jul 8, 2021

Branch pushed to git repo; I updated commit sha1. New commits:

c3eeb69build/bin/sage-spkg: Fix typo in comment, unindent a block

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Jul 8, 2021

comment:30

Replying to @jhpalmieri:

  • Was the variable USE_LOCAL_SCRIPTS essentially used to distinguish between old-stye and new-style packages?

Yes, exactly

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Jul 8, 2021

comment:31

Replying to @jhpalmieri:

it's hard for me to test them because I can't reproduce the race conditions. I've tried make -j20 on a pretty fast computer, but no luck so far.

Based on the failing runs that I have seen on GH Actions, I would estimate 1 in 30 to 1 in 200 runs fail...

In any case, there's another reason why we should make this change: Whenever a user uses sage -pip install to install any package, this can update various other Python packages as a side effect. So the installation records that the DESTDIR staging uses easily get out of sync with what's actually installed, making uninstallation by this method basically a gamble.

@jhpalmieri
Copy link
Member

comment:32

Not for this ticket, but I really don't like the warning message here (when running make numpy-clean):

Uninstalling existing 'numpy'
Running pip-uninstall script for 'numpy'
WARNING: Skipping numpy as it is not installed.
Removing stamp file '/Users/palmieri/Desktop/Sage/sage_builds/TESTING/sage-9.4.beta4/local/var/lib/sage/installed/numpy-1.20.3'

For this ticket: if I do make numpy-clean, should it also uninstall the packages that depend on numpy?

I also tried make jupyterlab_widgets, but it didn't write anything in local/var/lib/sage/scripts. This installed the packages jupyterlab, nodejs, and jupyterlab_widgets, at least according to the files in logs/pkgs, but there is no corresponding jupyterlab file in local/var/lib/sage/installed. The files for nodejs and jupyterlab_widgets are empty. make jupyterlab-clean did not remove jupyterlab_widgets. Is this the expected behavior? Maybe I don't understand your last post about how uninstallation is supposed to work.

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Jul 8, 2021

comment:33

Replying to @jhpalmieri:

I also tried make jupyterlab_widgets, but it didn't write anything in local/var/lib/sage/scripts. This installed the packages jupyterlab, nodejs, and jupyterlab_widgets, at least according to the files in logs/pkgs, but there is no corresponding jupyterlab file in local/var/lib/sage/installed. The files for nodejs and jupyterlab_widgets are empty. make jupyterlab-clean did not remove jupyterlab_widgets. Is this the expected behavior?

jupyterlab_widgets and nodejs are script packages; we keep track of their installation by creating an empty stamp file in ...installed. The present ticket makes no changes to script packages. There is no uninstallation procedure for script packages at all - only the stamp file is removed. (Only sagelib-clean and sage_docbuild-clean are defined directly in build/make/Makefile.in.)

jupyterlab is a pip package; we do not keep installation records for them. The present ticket makes no changes to pip packages either. Uninstallation is done using sage --pip uninstall (invoked by the makefile).

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Jul 8, 2021

comment:34

Replying to @jhpalmieri:

if I do make numpy-clean, should it also uninstall the packages that depend on numpy?

No, I don't think so. Neither our current build system (before the present ticket) nor pip do this.

(Reinstallation of packages depending on numpy is instead triggered by a reinstallation of numpy.)

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Jul 8, 2021

comment:35

Replying to @jhpalmieri:

Not for this ticket, but I really don't like the warning message here (when running make numpy-clean):

Uninstalling existing 'numpy'
Running pip-uninstall script for 'numpy'
WARNING: Skipping numpy as it is not installed.
Removing stamp file '/Users/palmieri/Desktop/Sage/sage_builds/TESTING/sage-9.4.beta4/local/var/lib/sage/installed/numpy-1.20.3'

I think this warning will only show up if you switch between branches with and without this ticket.

@jhpalmieri
Copy link
Member

comment:36

Is pip better at atomic actions?

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Jul 13, 2021

comment:37

Replying to @jhpalmieri:

Is pip better at atomic actions?

I don't know, but we do not depend on it to be because we are holding a lock while installing a wheel, in build/bin/sage-pip-install. This was added in #21672.

@jhpalmieri
Copy link
Member

Reviewer: John Palmieri

@jhpalmieri
Copy link
Member

comment:38

Okay, let's merge it so it gets more testing.

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Jul 13, 2021

comment:39

Thanks!

@vbraun
Copy link
Member

vbraun commented Jul 23, 2021

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Aug 10, 2021

comment:41

A crucial commit was missing on this branch. Follow-up in #32361.

@mkoeppe
Copy link
Contributor Author

mkoeppe commented Aug 10, 2021

Changed commit from c3eeb69 to none

@embray
Copy link
Contributor

embray commented Aug 18, 2021

comment:42

I'm at peace with this solution :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants