Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create an arena for package names #242

Open
wants to merge 1 commit into
base: dev
Choose a base branch
from
Open

Conversation

Eh2406
Copy link
Member

@Eh2406 Eh2406 commented Jul 22, 2024

We have long known that resolution time is proportional to P::Clone. Most real-world resolution problems contain at least one String in there P. If performance is anywhere on the priority list then P::Clone should not allocate. Thanks to our library being generic this is easy for users to achieve and control. P can be a wrapper around Rc<P> and it does not allocate, or using various interning strategies use &str for even faster clone, or by using hashconsing (or any de-duplicating interning strategy) P can be a wrapper around usize. So for our benchmarks we used P=u32 because it was the simplest type that met our trait pounds OR P=&str because it was similar to what was easily available to a user. Now that we have production users, we see that there P tends to have a more expensive clone. (Rc for the cargo benchmarks, Arc for the uv use case, String for gleam and elm.)

It also turns out that resolution time is proportional to P::Hash. As P ends up being the key in a large number of hash tables. The hottest of these tables tends to be PartialSolution::package_assignments. Only the hashconsing (or its relatives) strategy allows a user to provide very fast P::Hash where P has a string in it. None of our production users are using this approach. So the performance of our internal benchmarks that use &str are far more realistic than the ones that use u32.

Luckily we can provide a hashconsing like wrapper around P for our users. We already have an arena such that if two IDs are equal then the data they point at must be equal, we just need to make the inverse true that if the data is equal then any IDs for that data will be equal. This arena instead of allocating by just pushing new items to a Vec (and returning the new index) would return a previous ID if the value had already been added and do the normal thing otherwise. This does not involve a lot of code by using the indexmap crate, which is already one of our dependencies.

This is a pessimization four benchmarks that use u32, or for any (theoretical) users who are already using hashconsing. Unfortunately rust does not have specialization, and even if it did there is no trait bound for "Hash is cheap". Based on the (hilariously out of date named) large_case_u16_NumberVersion this is a ~9.5% regression, similarly ~10.5% for the synthetic slow_135_0_u16_NumberVersion. A sudoku problem, which is not synthetic but also not the intended use case, sees a similar regression.

Real-world benchmarks see significant improvements. elm_str_SemanticVersion 10.2%, zuse_str_SemanticVersion 19.9%, all of crates.io 14% without lock files and 11% with.

I'd love to hear the impact on uv benchmarks, but I expect them to be in a similar range. Perhaps infinitesimally bigger because they are already collecting this data for their implementation of prioritize which can now just be Id<P>::as_raw().

There is potential follow-up work because Map<Id<P (especially when densely filled) can be replaced with a Vec decreasing the size and removing the calculation of Hash for Id. But it's a lot of code for a much smaller when so I left it out for now.

Unfortunately bunch of log and panic messages now refer to Id(1) instead of the name of that package. Similarly they use Debug instead of display. This is hard to fix because the relevant impls do not have access to the arena in which the actual package name is stored.

@charliermarsh
Copy link
Contributor

Awesome idea!

@charliermarsh
Copy link
Contributor

We'll try it and report back.

@charliermarsh
Copy link
Contributor

My impression is that it gives us a small but consistent speedup, e.g., on a large resolution with a filled cache (so minimal IO):

❯ hyperfine "../target/release/main lock" "../target/release/uv lock" --warmup 100 --runs 100
Benchmark 1: ../target/release/main lock
  Time (mean ± σ):      27.3 ms ±   0.5 ms    [User: 21.3 ms, System: 5.2 ms]
  Range (min … max):    26.5 ms …  30.6 ms    100 runs

Benchmark 2: ../target/release/uv lock
  Time (mean ± σ):      26.7 ms ±   0.3 ms    [User: 20.8 ms, System: 5.1 ms]
  Range (min … max):    26.1 ms …  27.3 ms    100 runs

Summary
  '../target/release/uv lock' ran
    1.02 ± 0.02 times faster than '../target/release/main lock'

Or, with a pre-filled cache but no existing lockfile:

❯ hyperfine "../target/release/main lock" "../target/release/uv lock" --warmup 100 --runs 100 --prepare "rm uv.lock"
Benchmark 1: ../target/release/main lock
  Time (mean ± σ):     157.9 ms ±   4.7 ms    [User: 175.9 ms, System: 176.6 ms]
  Range (min … max):   147.0 ms … 174.1 ms    100 runs

Benchmark 2: ../target/release/uv lock
  Time (mean ± σ):     155.1 ms ±   4.3 ms    [User: 169.5 ms, System: 169.9 ms]
  Range (min … max):   146.6 ms … 169.9 ms    100 runs

Summary
  '../target/release/uv lock' ran
    1.02 ± 0.04 times faster than '../target/release/main lock'

@charliermarsh
Copy link
Contributor

I can probably do a bit better with more work on our side to use the IDs everywhere.

@Eh2406
Copy link
Member Author

Eh2406 commented Jul 23, 2024

10% on only resolution micro benchmarks leading to 1% on a end-to-end test makes sense if we are spending about 10% inside resolution code.

@charliermarsh
Copy link
Contributor

Yeah, seems like a clear improvement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants