Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Unexplained sytest flakiness after poetry changes #12419

Closed
DMRobertson opened this issue Apr 8, 2022 · 12 comments
Closed

Unexplained sytest flakiness after poetry changes #12419

DMRobertson opened this issue Apr 8, 2022 · 12 comments
Labels
T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks.

Comments

@DMRobertson
Copy link
Contributor

I've now seen two sytest runs fail whe nrunning poetry install --extras all.

#12416 had https://github.com/matrix-org/synapse/runs/5881698486?check_suite_focus=true fail with


  EnvCommandError

  Command ['/synapse/.venv/bin/pip', 'install', '--no-deps', '/github/home/.cache/pypoetry/artifacts/7c/b7/77/d5fe572b0165704945977ef97aeea1ba572a64299d08cb121f46b6c925/types_urllib3-1.26.10-py3-none-any.whl'] errored with the following return code 1, and output: 
  Processing /github/home/.cache/pypoetry/artifacts/7c/b7/77/d5fe572b0165704945977ef97aeea1ba572a64299d08cb121f46b6c925/types_urllib3-1.26.10-py3-none-any.whl
  Installing collected packages: types-urllib3
  ERROR: Could not install packages due to an OSError: [Errno 2] No such file or directory: '/venv/lib/python3.7/site-packages/thrift-0.16.0.egg-info/PKG-INFO'

#12415 had https://github.com/matrix-org/synapse/runs/5884140294?check_suite_focus=true fail similarly.

Logs
 EnvCommandError

  Command ['/synapse/.venv/bin/pip', 'install', '--no-deps', '-U', '/github/home/.cache/pypoetry/artifacts/a4/da/35/051c6fc59cf4617a1394cd3ea3be384682fd4cd8490de9036f1bfe59d2/signedjson-1.1.4-py3-none-any.whl'] errored with the following return code 2, and output: 
  ERROR: Exception:
  Traceback (most recent call last):
    File "/venv/lib/python3.8/site-packages/pip/_internal/cli/base_command.py", line 167, in exc_logging_wrapper
      status = run_func(*args)
    File "/venv/lib/python3.8/site-packages/pip/_internal/cli/req_command.py", line 205, in wrapper
      return func(self, options, args)
    File "/venv/lib/python3.8/site-packages/pip/_internal/commands/install.py", line 285, in run
      session = self.get_default_session(options)
    File "/venv/lib/python3.8/site-packages/pip/_internal/cli/req_command.py", line 75, in get_default_session
      self._session = self.enter_context(self._build_session(options))
    File "/venv/lib/python3.8/site-packages/pip/_internal/cli/req_command.py", line 89, in _build_session
      session = PipSession(
    File "/venv/lib/python3.8/site-packages/pip/_internal/network/session.py", line 282, in __init__
      self.headers["User-Agent"] = user_agent()
    File "/venv/lib/python3.8/site-packages/pip/_internal/network/session.py", line 157, in user_agent
      setuptools_dist = get_default_environment().get_distribution("setuptools")
    File "/venv/lib/python3.8/site-packages/pip/_internal/metadata/__init__.py", line 24, in get_default_environment
      from .pkg_resources import Environment
    File "/venv/lib/python3.8/site-packages/pip/_internal/metadata/pkg_resources.py", line 9, in <module>
      from pip._vendor import pkg_resources
    File "/venv/lib/python3.8/site-packages/pip/_vendor/pkg_resources/__init__.py", line 3252, in <module>
      def _initialize_master_working_set():
    File "/venv/lib/python3.8/site-packages/pip/_vendor/pkg_resources/__init__.py", line 3235, in _call_aside
      f(*args, **kwargs)
    File "/venv/lib/python3.8/site-packages/pip/_vendor/pkg_resources/__init__.py", line 3264, in _initialize_master_working_set
      working_set = WorkingSet._build_master()
    File "/venv/lib/python3.8/site-packages/pip/_vendor/pkg_resources/__init__.py", line 574, in _build_master
      ws = cls()
    File "/venv/lib/python3.8/site-packages/pip/_vendor/pkg_resources/__init__.py", line 567, in __init__
      self.add_entry(entry)
    File "/venv/lib/python3.8/site-packages/pip/_vendor/pkg_resources/__init__.py", line 623, in add_entry
      for dist in find_distributions(entry, True):
    File "/venv/lib/python3.8/site-packages/pip/_vendor/pkg_resources/__init__.py", line 2065, in find_on_path
      for dist in factory(fullpath):
    File "/venv/lib/python3.8/site-packages/pip/_vendor/pkg_resources/__init__.py", line 2127, in distributions_from_metadata
      if len(os.listdir(path)) == 0:
  FileNotFoundError: [Errno 2] No such file or directory: '/venv/lib/python3.8/site-packages/sentry_sdk-1.5.8.dist-info'
  WARNING: Ignoring invalid distribution -honenumbers (/venv/lib/python3.8/site-packages)
  WARNING: Ignoring invalid distribution -entry-sdk (/venv/lib/python3.8/site-packages)
  WARNING: Ignoring invalid distribution -honenumbers (/venv/lib/python3.8/site-packages)
  WARNING: Ignoring invalid distribution -entry-sdk (/venv/lib/python3.8/site-packages)

In both situations it seems like

  • poetry is invoking pip to install a specific wheel---possibly upgrading it from an existing installation
  • pip tries to look up package metadata
  • that fails because a file isn't found on disk

This isn't happening consistently; I suspect some kind of filesystem access race.

python-poetry/poetry#2658 and python-poetry/poetry#4143 seem to have enconutered this problem.

@DMRobertson DMRobertson added the T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks. label Apr 8, 2022
@squahtx
Copy link
Contributor

squahtx commented Apr 8, 2022

In both instances, the file that's not found is from a package that poetry has decided to downgrade. You're almost certainly right about it being a race condition!

The suggested workarounds seem to be setting installer.max-workers=1 or upgrading pip. I'm not convinced by the latter.

I'm also not convinced that it's fixed on poetry 1.2.0, since it still uses pip and python-poetry/poetry@08ab6ca only deals with uninstalls, not up/downgrades.

@squahtx
Copy link
Contributor

squahtx commented Apr 8, 2022

...installer.max-workers is only available in poetry 1.2.0
so we'll have to set experimental.new-installer instead, which is unfortunate.

@DMRobertson
Copy link
Contributor Author

Hmm. The sytest-for-synapse docker image is currently in some kind of hybrid mode which supports setuptools and poetry, right? I wonder if ditching the setuptools support might help (so that the docker image is more likely to have the right dependencies installed)?

(Or, if the flake is sufficiently rare, retry it up to five times and hope for the best?)

@squahtx
Copy link
Contributor

squahtx commented Apr 8, 2022

> I wonder if ditching the setuptools support might help (so that the docker image is more likely to have the right dependencies installed)?

Uncommenting https://github.com/matrix-org/sytest/blob/develop/docker/synapse.Dockerfile#L35 would definitely help, up until we next update our pinned dependencies. I'd guess that the more dependencies that poetry needs to upgrade/downupgrade, the more likely it is that we'll have issues.

Given the choice between slower-but-unflaky and faster-but-flaky, I'd choose the former. It's frustrating to come back to merge a PR, only to find that CI's failed for some silly reason.

squahtx pushed a commit to matrix-org/sytest that referenced this issue Apr 8, 2022
Poetry runs multiple pip operations in parallel. Unfortunately this
results in race conditions when a dependency is installed while another
dependency is being up/downgraded in `scripts/synapse_sytest.sh`.
Configure poetry to serialize `pip` operations.

See matrix-org/synapse#12419

Signed-off-by: Sean Quah <seanq@element.io>
@squahtx
Copy link
Contributor

squahtx commented Apr 8, 2022

PR'd the workaround here: matrix-org/sytest#1240

squahtx added a commit to matrix-org/sytest that referenced this issue Apr 8, 2022
Poetry runs multiple pip operations in parallel. Unfortunately this
results in race conditions when a dependency is installed while another
dependency is being up/downgraded in `scripts/synapse_sytest.sh`.
Configure poetry to serialize `pip` operations.

See matrix-org/synapse#12419

Signed-off-by: Sean Quah <seanq@element.io>
@DMRobertson
Copy link
Contributor Author

Many thanks @squahtx! Let's keep an eye on this; we can reopen the issue if there are further problems.

@DMRobertson
Copy link
Contributor Author

I think https://github.com/matrix-org/synapse/runs/5890104666?check_suite_focus=true hit this too---but I'm not sure if that's to be expected? Maybe the sytest fix hasn't percolated yet somehow?

@squahtx
Copy link
Contributor

squahtx commented Apr 8, 2022

I think https://github.com/matrix-org/synapse/runs/5890104666?check_suite_focus=true hit this too---but I'm not sure if that's to be expected? Maybe the sytest fix hasn't percolated yet somehow?

Looks like it's already using the new image. The hash in https://github.com/matrix-org/synapse/runs/5890104666?check_suite_focus=true#step:2:119 matches the recently pushed image:

  3a0a091b32ef: Pull complete
  Digest: sha256:23e097ec2506250390f267ef78d6035e7708ab579132700054a06e2cc9be5267
  Status: Downloaded newer image for matrixdotorg/sytest-synapse:focal

I'm going to have to revisit this on Monday.

@squahtx squahtx reopened this Apr 8, 2022
@squahtx
Copy link
Contributor

squahtx commented Apr 11, 2022

None of the poetry config specified in the dockerfile appears to be holding.
During the docker build, poetry's config path is /root/.config/pypoetry/config.toml.
But inside GitHub Actions, poetry's config path becomes /github/home/.config/pypoetry/config.toml.

@DMRobertson
Copy link
Contributor Author

That's the user-level poetry config. I think you can also set project-level config too with poetry config --local, which writes poetry.toml to the root of synapse's source checkout.

@squahtx
Copy link
Contributor

squahtx commented Apr 11, 2022

I'm having trouble seeing how we'd make the project-level config work, since Synapse is only checked out at runtime. We could delay setting experimental.new-installer until runtime of course, but then we'd still have the virtualenvs.in-project setting that needs to be set at both build time and runtime.

@squahtx squahtx closed this as completed Apr 12, 2022
@squahtx
Copy link
Contributor

squahtx commented Apr 12, 2022

Resolved by matrix-org/sytest#1241 for real this time. Hopefully.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks.
Projects
None yet
Development

No branches or pull requests

2 participants