Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include linehaul information in user agent #1958

Closed
pradyunsg opened this issue Feb 25, 2024 · 10 comments · Fixed by #2493
Closed

Include linehaul information in user agent #1958

pradyunsg opened this issue Feb 25, 2024 · 10 comments · Fixed by #2493
Assignees
Labels
good first issue Good for newcomers internal A refactor or improvement that is not user-facing

Comments

@pradyunsg
Copy link

This is effectively a request to include download-related information available to PyPI when interacting with the index server, so that informtion can used to make ecosystem-wide decision (by querying said information via https://warehouse.pypa.io/api-reference/bigquery-datasets.html#download-statistics-table).

https://github.com/pypi/linehaul-cloud-function is the PyPI side implementation. https://github.com/pypa/pip/blob/24.0/src/pip/_internal/network/session.py#L109 is the pip side implementation.

This data powers decision making such as https://pypistats.org/packages/__all__ (and similar sites), https://mayeut.github.io/manylinux-timeline/ and a few ad-hoc queries to determine usage patterns across the ecosystem.

@charliermarsh
Copy link
Member

Perfect, thanks @pradyunsg!

@charliermarsh charliermarsh added the internal A refactor or improvement that is not user-facing label Feb 25, 2024
@charliermarsh
Copy link
Member

Everything here looks pretty straightforward (and we should already have it all available), perhaps with the exception of setuptools_version.

@charliermarsh charliermarsh added the good first issue Good for newcomers label Feb 25, 2024
@pradyunsg
Copy link
Author

perhaps with the exception of setuptools_version

Yea, skipping values you don't have readily available seems fine.

@hauntsaninja
Copy link
Contributor

Last I benchmarked, construction of user agent was surprisingly expensive in pip, took like 200ms.

@hauntsaninja
Copy link
Contributor

hauntsaninja commented Feb 25, 2024

Hm, looks like it might cost up to 600ms in a representative environment:

λ hyperfine -w 3 'python -c "import pkg_resources; from pip._internal.network.session import *; user_agent()"' 'python -c "import pkg_resources; from pip._internal.network.session import *"'
Benchmark 1: python -c "import pkg_resources; from pip._internal.network.session import *; user_agent()"
  Time (mean ± σ):      1.423 s ±  0.031 s    [User: 0.677 s, System: 0.365 s]
  Range (min … max):    1.388 s …  1.482 s    10 runs
 
Benchmark 2: python -c "import pkg_resources; from pip._internal.network.session import *"
  Time (mean ± σ):     854.2 ms ±   6.0 ms    [User: 414.9 ms, System: 198.2 ms]
  Range (min … max):   843.4 ms … 862.7 ms    10 runs

If pip were to skip getting setuptools version then pip's user agent construction costs 190ms for me. Next most expensive thing is rustc version, which wouldn't be horrible to cache. I guess off topic for this tracker, but Pradyun let me know if pip is interested in PRs here

@konstin
Copy link
Member

konstin commented Feb 25, 2024

I've asked upstream about rustc startup time: rust-lang/rustup#2626 (comment). Is there a page i can link with rust version stats to motivate this use case?

Next most expensive thing is rustc version, which wouldn't be horrible to cache.

The entrypoint is a shim, so afaik you can't cache it reliably (i'm happy to query the default rustc from a different place though).

Is the setuptools information from uv relevant given that we don't install it be default, and always use the latest (compatible) version in build envs?

Except for the rustc call and the (base) interpreter metadata call we cache already, i think we should be able to do without any subprocess calls, i.e. fast enough it's not noticeable on profiles.

@alex
Copy link
Contributor

alex commented Feb 25, 2024

In general I would say that the setuptools information is much less useful than it used to be: projects who need a newer setuptools can now reliably depend on it with build-system.requires.

What do you mean a page with rust version stats? Are you asking where you can see the stats from PyPI? They're all available in BigQuery. Here's an example of the type of analysis I do with it:
pyca_cryptography_rage_dashboard (1).pdf

@charliermarsh
Copy link
Member

Let’s just stick to what we have access to (I’d like to omit rustc and setuptools for now).

@konstin
Copy link
Member

konstin commented Feb 25, 2024

What do you mean a page with rust version stats? Are you asking where you can see the stats from PyPI? They're all available in BigQuery. Here's an example of the type of analysis I do with it:
pyca_cryptography_rage_dashboard (1).pdf

Yes, something like all but including the rust version. Getting the data from bigquery is quite the overhead if someone wants just simple caniuse.com style check.

@alex
Copy link
Contributor

alex commented Feb 25, 2024

Ah. I'm not aware of any website that displays rust versions for all of PyPI.

charliermarsh pushed a commit that referenced this issue Mar 4, 2024
## Summary

Closes #1977

This allows us to send uv's version in the `uv-client` User Agent
header.

Here's how request headers look like to a server now:
```
...
Accept: application/vnd.pypi.simple.v1+json, application/vnd.pypi.simple.v1+html;q=0.2, text/html;q=0.01
User-Agent: uv/0.1.13
...
```

~~I went for a mix of Option 1 and 2 from #1977.~~ Open to alternative
naming as well, not tied too strongly here to the names picked.

~~Another possibility for this new crate is that we can use it to
consolidate metadata that exists across crates to ultimately be able to
create linehaul information described in #1958, but I haven't looked
into what those changes might look like.~~

<!-- What's the purpose of the change? What does it do, and why? -->

## Test Plan

<!-- How was it tested? -->
Added initial tests in the new crate to exercise its public API and
added a new test to uv-client to validate the headers using a 1-time
disposable server.
@konstin konstin self-assigned this Mar 14, 2024
konstin added a commit that referenced this issue Mar 18, 2024
## Summary

Closes #1958

This adds linehaul metadata to uv's user-agent when pep 508 markers are
provided to the RegistryClientBuilder. Thanks to #2381, we were able to
leverage most information from markers and avoid inconsistency.

Linehaul is meant to be accompanying metadata pip sends in it's user
agent when talking to registries. You can see this output by running
something like `python -c 'from pip._internal.network.session import
user_agent; print(user_agent())'`.
In PyPI, this metadata processed by the
[linehaul-cloud-function](https://github.com/pypi/linehaul-cloud-function).
More info about linehaul can be found in #1958.

Below are some examples from pip:

* Linux GHA: `pip/24.0
{"ci":true,"cpu":"x86_64","distro":{"id":"jammy","libc":{"lib":"glibc","version":"2.35"},"name":"Ubuntu","version":"22.04"},"implementation":{"name":"CPython","version":"3.12.2"},"installer":{"name":"pip","version":"24.0"},"openssl_version":"OpenSSL
3.0.2 15 Mar
2022","python":"3.12.2","rustc_version":"1.76.0","system":{"name":"Linux","release":"6.5.0-1016-azure"}}`
* Windows GHA: `pip/24.0
{"ci":true,"cpu":"AMD64","implementation":{"name":"CPython","version":"3.12.2"},"installer":{"name":"pip","version":"24.0"},"openssl_version":"OpenSSL
3.0.13 30 Jan
2024","python":"3.12.2","rustc_version":"1.76.0","system":{"name":"Windows","release":"2022Server"}}`
* OSX GHA: `pip/24.0
{"ci":true,"cpu":"arm64","distro":{"name":"macOS","version":"14.2.1"},"implementation":{"name":"CPython","version":"3.12.2"},"installer":{"name":"pip","version":"24.0"},"openssl_version":"OpenSSL
3.0.13 30 Jan
2024","python":"3.12.2","rustc_version":"1.76.0","system":{"name":"Darwin","release":"23.2.0"}}`



Here's how uv results look like (sorry for the keys not having the same
order):

* Linux GHA: `uv/0.1.21
{"installer":{"name":"uv","version":"0.1.21"},"python":"3.12.2","implementation":{"name":"CPython","version":"3.12.2"},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":"Linux","release":"6.5.0-1016-azure"},"cpu":"x86_64","openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}`
* Windows GHA: `uv/0.1.21
{"installer":{"name":"uv","version":"0.1.21"},"python":"3.12.2","implementation":{"name":"CPython","version":"3.12.2"},"distro":null,"system":{"name":"Windows","release":"2022Server"},"cpu":"AMD64","openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}`
* OSX GHA: `uv/0.1.21
{"installer":{"name":"uv","version":"0.1.21"},"python":"3.12.2","implementation":{"name":"CPython","version":"3.12.2"},"distro":{"name":"macOS","version":"14.2.1","id":null,"libc":null},"system":{"name":"Darwin","release":"23.2.0"},"cpu":"arm64","openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}`

Distro information (such as the one pip uses `from pip._vendor import
distro` to retrieve instead of `platform` module) was not retrieved from
markers. Instead, the linux release codename/name/version uses
`sys-info` crate, adding about 50us of extra overhead on linux. The
distro osx version re-used the [mac_os version
implementation](https://github.com/astral-sh/uv/blob/99c992e38b220fbcda09b0b43602b3db2321480b/crates/platform-host/src/mac_os.rs)
from #2381 which adds about 20us of overhead on osx. I tried to use
other crates to avoid re-introducing `mac_os.rs` but most of them didn't
yield satisfactory performance (40ms-60ms~) or had the wrong values
needed (e.g. darwin version vs osx version).

I also didn't add libc retrieval or rustc retrieval as those seem to add
substantial overhead due to querying `ldd` or `rustc`. PyPy version
detection was also not added to avoid adding extra overhead to [support
PyPy for
linehaul](https://github.com/pypa/pip/blob/24.0/src/pip/_internal/network/session.py#L123).
All other behavior was kept 1-1 to match what pip's linehaul
implementation does (as of 24.0). This also aligns with what was
discussed in #1958.

## Test Plan

Added new integration test to uv-client.

---------

Co-authored-by: konstin <konstin@mailbox.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers internal A refactor or improvement that is not user-facing
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants