Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redo vendoring based on pypi.org/project/vendoring/ #49752

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

gsnedders
Copy link
Member

@gsnedders gsnedders commented Dec 18, 2024

See individual commits here.

In short, this uses vendoring to re-do all of our Python vendoring, directly into third_party (crudely: pip install -t tools/third_party ...), and then moves to a generated requirements_vendor.txt which finds several recursive dependencies we'd missed and makes it obvious we can drop others.

The only version change here is we go from vendoring html5lib as of html5lib/html5lib-python@f4646e6 to the html5lib==1.1 release; the only change between that commit and the 1.1 release is going from __version__ = "1.2-dev" to __version__ = "1.1".

This is a huge negative line diff primarily because we're no longer vendoring all the documentations and unit tests and CI workflows and all from everything we vendor; instead we're just vendoring the actual distribution packages, with their licenses.

I'll argue this doesn't require an RFC because we're not actually vendoring anything new here — we're merely changing how we do our pre-existing vendoring, and this should have no impact on any downstream consumers. (This is pedantically untrue: we've changed how we define sys.path which might break if others aren't relying on tools.localpaths for this, and we've removed vendored packages we no longer use which others could rely on elsewhere.)

Why? Because having all the documentation and everything is just a nuisance, and having a huge number of items added to sys.path is painful (as it makes it long and hard to understand what's going on, and makes us more dissimilar to how these packages expect to be installed). Having this automated also makes it much easier to update our vendored packages, or add further new vendored packages in future (as #49598 does).

This allows us to move the third_party directory to being managed by
Python tooling.
This moves us to relying on the vendoring tool to do our vendoring,
rather than doing this on an ad-hoc basis. See: tools/pyproject.toml,
tools/requirements_vendor.txt, tools/third_party_patches.

This doesn't change what we have vendored, it merely moves it over to
the new vendoring mechanism.

Notably, this leaves us _without_ all the non-installed bits of these
various third-party libraries: we no longer have docs or tests or
GitHub Actions Workflows or anything else. This also means we only
need to have third_party on the sys.path, and nothing else.

This requires two changes to our actual code:

 * tools/wpt/android.py, because we've changed the path of tooltool
   (because that's how it is packaged), and

 * webdriver/tests/conftest.py, because we no longer have the
   *.egg-info (or *.dist-info) from pytest-asyncio, and thus must
   manually register the plugin entrypoint.

For anyone wishing to audit this commit, the following may be helpful:

    git diff --raw HEAD^- | cut -d' ' -f5- | sort
Per e08f471 it appears it was intended to vendor html5lib==1.1, but
instead the commit after 1.1 seems to have been vendored.

Rather than vendoring a random revision from git, downgrade to the
actual release, which is merely changes the version number.

The removal of html5lib/LICENSE.md here is purely because vendoring is
now looking purely at the wheel for the license, rather than the
entire repository, thus it no longer picks up the license in
benchmarks/data/wpt/LICENSE.md, as no part of that directory is in the
distribution package.
Then, using uv, we can build requirements_vendor.txt and remove
various unused dependencies. This isn't perfect, because we want to
vendor everything regardless of markers, and thus this still requires
manual fixup to remove all markers from the output.

This allows us to remove a number of packages (attrs, certifi,
importlib-metadata, more-itertools, pathlib2, and zipp), but does add
a few extra (colorama, which gives pytest better colourized output on
Windows; tomli, which pytest relies on to parse *.toml files on Python
< 3.11; and pyparsing, which packaging relies on to parse markers and
requirements).
@Ms2ger
Copy link
Contributor

Ms2ger commented Dec 18, 2024

I might err towards an RFC for visibility. The change seems reasonable on the face of it; not going to review in detail obviously.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants