Run resolve/install benchmarks in ci #3281

ibraheemdev · 2024-04-26T16:41:10Z

Summary

Runs resolver benchmarks in CI with codspeed.

codspeed-hq · 2024-04-26T19:42:45Z

CodSpeed Performance Report

Congrats! CodSpeed is installed 🎉

🆕 12 new benchmarks were detected.

You will start to see performance impacts in the reports once the benchmarks are run from your default branch.

Detected benchmarks

build_platform_tags[burntsushi-archlinux] (6.3 ms)
wheelname_parsing[flyte-long-compatible] (21 µs)
wheelname_parsing[flyte-long-incompatible] (26.3 µs)
wheelname_parsing[flyte-short-compatible] (11.9 µs)
wheelname_parsing[flyte-short-incompatible] (12.2 µs)
wheelname_parsing_failure[flyte-long-extension] (2.6 µs)
wheelname_parsing_failure[flyte-short-extension] (2.6 µs)
wheelname_tag_compatibility[flyte-long-compatible] (2.6 µs)
wheelname_tag_compatibility[flyte-long-incompatible] (1.8 µs)
wheelname_tag_compatibility[flyte-short-compatible] (2.5 µs)
wheelname_tag_compatibility[flyte-short-incompatible] (1.1 µs)
resolve_warm_jupyter (366.7 ms)

crates/bench/benches/uv.rs

ibraheemdev · 2024-04-26T20:45:31Z

Hmm it doesn't look like the benchmarks are running correctly under Codspeed, the performance report is showing the resolve/install benchmarks running in microseconds.

adriencaccia · 2024-04-26T21:03:48Z

Hey @ibraheemdev, I am a co-founder at @CodSpeedHQ!

Hmm it doesn't look like the benchmarks are running correctly under Codspeed, the performance report is showing the resolve/install benchmarks running in microseconds.

Yes, running arbitrary executables in a benchmark with CodSpeed will not give out relevant results, as most of the compute is done in a new process that is not instrumented.
It would be best to directly call the underlying functions of the library, without relying on the built executable.

For example, calling

uv/crates/uv/src/commands/pip_compile.rs

Line 52 in 2af80c2

pub(crate) async fn pip_compile(

instead of
https://github.com/ibraheemdev/uv/blob/4ebdc40f60562c05559ac6331abe1a56275e2c8b/crates/bench/benches/uv.rs#L41-L42.

Hope that helps you a bit 😃

ibraheemdev · 2024-04-26T21:06:01Z

@adriencaccia Thanks! I suspected we would have to do this eventually, but didn't realize CodSpeed didn't support processing external commands at all.

charliermarsh

Nice, thank you! Open to giving this a shot. Do we have any sense for what the variance/noise will look like?

adriencaccia · 2024-04-30T14:44:56Z

Nice, thank you! Open to giving this a shot. Do we have any sense for what the variance/noise will look like?

I tested it out on my fork at adriencaccia#1, and I have the following variance results for 101 runs on the same commit:

Found 101 runs for adriencaccia/uv (fca26cde1b54f7467267ca4dff7a9b9cb6f10d29)
┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬───────────┬───────────────────┬─────────────────────┬───────────┬──────────────────┐
│                                                                              (index)                                                                               │  average  │ standardDeviation │ varianceCoefficient │   range   │ rangeCoefficient │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼───────────┼───────────────────┼─────────────────────┼───────────┼──────────────────┤
│ crates/bench/benches/distribution_filename.rs::distribution_filename::benchmark_wheelname_tag_compatibility::wheelname_tag_compatibility[flyte-short-incompatible] │ '1.1 µs'  │     '27.3 ns'     │       '2.5%'        │ '55.6 ns' │      '5.1%'      │
│  crates/bench/benches/distribution_filename.rs::distribution_filename::benchmark_wheelname_tag_compatibility::wheelname_tag_compatibility[flyte-short-compatible]  │ '2.5 µs'  │     '27.3 ns'     │       '1.1%'        │ '55.6 ns' │      '2.2%'      │
│  crates/bench/benches/distribution_filename.rs::distribution_filename::benchmark_wheelname_tag_compatibility::wheelname_tag_compatibility[flyte-long-compatible]   │ '2.6 µs'  │     '27.3 ns'     │       '1.0%'        │ '55.6 ns' │      '2.1%'      │
│ crates/bench/benches/distribution_filename.rs::distribution_filename::benchmark_wheelname_tag_compatibility::wheelname_tag_compatibility[flyte-long-incompatible]  │ '1.8 µs'  │     '13.6 ns'     │       '0.7%'        │ '27.8 ns' │      '1.5%'      │
│    crates/bench/benches/distribution_filename.rs::distribution_filename::benchmark_wheelname_parsing_failure::wheelname_parsing_failure[flyte-short-extension]     │ '2.6 µs'  │     '13.6 ns'     │       '0.5%'        │ '27.8 ns' │      '1.1%'      │
│     crates/bench/benches/distribution_filename.rs::distribution_filename::benchmark_wheelname_parsing_failure::wheelname_parsing_failure[flyte-long-extension]     │ '2.6 µs'  │     '13.6 ns'     │       '0.5%'        │ '27.8 ns' │      '1.1%'      │
│            crates/bench/benches/distribution_filename.rs::distribution_filename::benchmark_wheelname_parsing::wheelname_parsing[flyte-short-compatible]            │  '12 µs'  │     '13.6 ns'     │       '0.1%'        │ '27.8 ns' │      '0.2%'      │
│           crates/bench/benches/distribution_filename.rs::distribution_filename::benchmark_wheelname_parsing::wheelname_parsing[flyte-short-incompatible]           │ '12.2 µs' │     '13.6 ns'     │       '0.1%'        │ '27.8 ns' │      '0.2%'      │
│           crates/bench/benches/distribution_filename.rs::distribution_filename::benchmark_build_platform_tags::build_platform_tags[burntsushi-archlinux]           │ '6.3 ms'  │     '13.6 ns'     │       '0.0%'        │ '27.8 ns' │      '0.0%'      │
│           crates/bench/benches/distribution_filename.rs::distribution_filename::benchmark_wheelname_parsing::wheelname_parsing[flyte-long-incompatible]            │ '26.3 µs' │      '0 ns'       │       '0.0%'        │   '0 s'   │      '0.0%'      │
│            crates/bench/benches/distribution_filename.rs::distribution_filename::benchmark_wheelname_parsing::wheelname_parsing[flyte-long-compatible]             │  '21 µs'  │      '0 ns'       │       '0.0%'        │   '0 s'   │      '0.0%'      │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴───────────┴───────────────────┴─────────────────────┴───────────┴──────────────────┘

It is fairly stable, so you should be able to set a low regression threshold to around 5% 🙂

charliermarsh · 2024-04-30T14:51:38Z

Awesome, thanks so much Adrien!

zanieb · 2024-04-30T15:06:58Z

Hm I don't see the resolver benchmarks there — I'd expect the distribution filename benches to be very stable but the resolver ones are probably less so.

zanieb · 2024-04-30T15:10:32Z

Looks like there's something wrong and the resolver benches are missing on the latest commit.

ibraheemdev · 2024-04-30T15:37:55Z

I forgot to use the codspeed-criterion-compat shim in the uv benchmarks, but it looks like the crate doesn't support async runs.

adriencaccia · 2024-04-30T16:14:51Z

@ibraheemdev let me know when you want me to run variance checks again on the new benchmarks

ibraheemdev · 2024-04-30T16:16:38Z

@adriencaccia can you run them now? I'm also curious why benchmarks seem to be running ~15x slower on CodSpeed than locally, is it using an aggregate time instead of per-run?

adriencaccia · 2024-04-30T16:29:25Z

@adriencaccia can you run them now?

Alright, I started them. Will post the results once they are done 😉

I'm also curious why benchmarks seem to be running ~15x slower on CodSpeed than locally, is it using an aggregate time instead of per-run?

This is because we run the code with valgrind, it adds a 4x to 10x overhead, sometimes more. But that is how we get those consistent measures and flamegraphs 😉

adriencaccia · 2024-04-30T16:36:58Z

Results with the new benchmarks:

Found 101 runs for adriencaccia/uv (7bbc18a361ba078e21186db90b98d6b88b3a8a7c)
┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬────────────┬───────────────────┬─────────────────────┬───────────┬──────────────────┐
│                                                                              (index)                                                                               │  average   │ standardDeviation │ varianceCoefficient │   range   │ rangeCoefficient │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼────────────┼───────────────────┼─────────────────────┼───────────┼──────────────────┤
│                                               crates/bench/benches/uv.rs::uv::resolve_warm_black::resolve_warm_black                                               │ '15.3 ms'  │    '527.5 µs'     │       '3.4%'        │ '2.3 ms'  │     '15.0%'      │
│ crates/bench/benches/distribution_filename.rs::distribution_filename::benchmark_wheelname_tag_compatibility::wheelname_tag_compatibility[flyte-short-incompatible] │  '1.1 µs'  │     '27.5 ns'     │       '2.5%'        │ '55.6 ns' │      '5.1%'      │
│                                             crates/bench/benches/uv.rs::uv::resolve_warm_jupyter::resolve_warm_jupyter                                             │ '366.5 ms' │     '4.2 ms'      │       '1.1%'        │ '22.8 ms' │      '6.2%'      │
│  crates/bench/benches/distribution_filename.rs::distribution_filename::benchmark_wheelname_tag_compatibility::wheelname_tag_compatibility[flyte-short-compatible]  │  '2.5 µs'  │     '27.5 ns'     │       '1.1%'        │ '55.6 ns' │      '2.2%'      │
│  crates/bench/benches/distribution_filename.rs::distribution_filename::benchmark_wheelname_tag_compatibility::wheelname_tag_compatibility[flyte-long-compatible]   │  '2.6 µs'  │     '27.5 ns'     │       '1.0%'        │ '55.6 ns' │      '2.1%'      │
│ crates/bench/benches/distribution_filename.rs::distribution_filename::benchmark_wheelname_tag_compatibility::wheelname_tag_compatibility[flyte-long-incompatible]  │  '1.9 µs'  │     '13.7 ns'     │       '0.7%'        │ '27.8 ns' │      '1.5%'      │
│    crates/bench/benches/distribution_filename.rs::distribution_filename::benchmark_wheelname_parsing_failure::wheelname_parsing_failure[flyte-short-extension]     │  '2.6 µs'  │     '13.7 ns'     │       '0.5%'        │ '27.8 ns' │      '1.1%'      │
│     crates/bench/benches/distribution_filename.rs::distribution_filename::benchmark_wheelname_parsing_failure::wheelname_parsing_failure[flyte-long-extension]     │  '2.6 µs'  │     '13.7 ns'     │       '0.5%'        │ '27.8 ns' │      '1.1%'      │
│            crates/bench/benches/distribution_filename.rs::distribution_filename::benchmark_wheelname_parsing::wheelname_parsing[flyte-short-compatible]            │  '12 µs'   │     '13.7 ns'     │       '0.1%'        │ '27.8 ns' │      '0.2%'      │
│           crates/bench/benches/distribution_filename.rs::distribution_filename::benchmark_wheelname_parsing::wheelname_parsing[flyte-short-incompatible]           │ '12.2 µs'  │     '13.7 ns'     │       '0.1%'        │ '27.8 ns' │      '0.2%'      │
│           crates/bench/benches/distribution_filename.rs::distribution_filename::benchmark_build_platform_tags::build_platform_tags[burntsushi-archlinux]           │  '6.3 ms'  │     '13.7 ns'     │       '0.0%'        │ '27.8 ns' │      '0.0%'      │
│           crates/bench/benches/distribution_filename.rs::distribution_filename::benchmark_wheelname_parsing::wheelname_parsing[flyte-long-incompatible]            │ '26.3 µs'  │      '0 ns'       │       '0.0%'        │   '0 s'   │      '0.0%'      │
│            crates/bench/benches/distribution_filename.rs::distribution_filename::benchmark_wheelname_parsing::wheelname_parsing[flyte-long-compatible]             │  '21 µs'   │      '0 ns'       │       '0.0%'        │   '0 s'   │      '0.0%'      │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴────────────┴───────────────────┴─────────────────────┴───────────┴──────────────────┘

Indeed, it seems that crates/bench/benches/uv.rs::uv::resolve_warm_black::resolve_warm_black is a bit more inconsistent

charliermarsh · 2024-04-30T16:38:58Z

Maybe we remove the Black test? It seems like the variance is way higher than the Jupyter test.

zanieb · 2024-04-30T17:21:09Z

I wonder why that is. It shouldn't be that different? (as far as variance)

ibraheemdev · 2024-04-30T17:24:29Z

@zanieb It's probably that the actual resolve step is faster, so the benchmark is more influenced by other factors (file I/O, etc.)

ibraheemdev · 2024-04-30T17:39:25Z

I'm going to go ahead and merge this with just the jupyter benchmark. We'll see how consistent/useful the reports are.

ibraheemdev force-pushed the benches branch from 4b5fef3 to 872c3d6 Compare April 26, 2024 19:32

charliermarsh reviewed Apr 26, 2024

View reviewed changes

crates/bench/benches/uv.rs Outdated Show resolved Hide resolved

ibraheemdev commented Apr 26, 2024

View reviewed changes

crates/bench/benches/uv.rs Outdated Show resolved Hide resolved

ibraheemdev marked this pull request as ready for review April 26, 2024 20:41

ibraheemdev marked this pull request as draft April 26, 2024 21:06

ibraheemdev added benchmarks Related to benchmarking internal A refactor or improvement that is not user-facing labels Apr 29, 2024

ibraheemdev force-pushed the benches branch from 309328d to 167c1dd Compare April 29, 2024 18:27

ibraheemdev mentioned this pull request Apr 29, 2024

serde dependency of uv-resolver is not optional #3316

Closed

ibraheemdev added 13 commits April 29, 2024 16:31

run resolve/install benchmarks in ci

364ca4f

add determine_changes job

1e3b340

fix clippy lints

e59d347

fix benchmarks action

1cad2da

fix venv path

1a0bf82

fix clippy lints

fa9ffe4

update benchmarks

5c27221

update docs

e3d5268

fix clippy lints

04aca9f

use uv-resolver directly in benchmarks

5cce567

avoid resolving python version in benchmarks

1e87224

update ci

39deae5

fix clippy lints

368d948

ibraheemdev force-pushed the benches branch from 167c1dd to 368d948 Compare April 29, 2024 20:32

ibraheemdev marked this pull request as ready for review April 29, 2024 20:32

charliermarsh approved these changes Apr 30, 2024

View reviewed changes

ibraheemdev added 3 commits April 30, 2024 11:45

run uv benchmarks with codspeed

9c491cf

avoid loading ssl certs in benchmarks

809c0ff

run benchmarks with single threaded scheduler

7a3298e

ibraheemdev force-pushed the benches branch from d0d190c to 7a3298e Compare April 30, 2024 16:11

remove black benchmark

14bd612

ibraheemdev merged commit 1d2c57a into astral-sh:main Apr 30, 2024
43 checks passed

ibraheemdev mentioned this pull request May 21, 2024

Add benchmarks to CI #1999

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run resolve/install benchmarks in ci #3281

Run resolve/install benchmarks in ci #3281

ibraheemdev commented Apr 26, 2024 •

edited

Loading

codspeed-hq bot commented Apr 26, 2024 •

edited

Loading

Detected benchmarks

ibraheemdev commented Apr 26, 2024

adriencaccia commented Apr 26, 2024 •

edited

Loading

ibraheemdev commented Apr 26, 2024

charliermarsh left a comment

adriencaccia commented Apr 30, 2024

charliermarsh commented Apr 30, 2024

zanieb commented Apr 30, 2024

zanieb commented Apr 30, 2024

ibraheemdev commented Apr 30, 2024

adriencaccia commented Apr 30, 2024

ibraheemdev commented Apr 30, 2024

adriencaccia commented Apr 30, 2024 •

edited

Loading

adriencaccia commented Apr 30, 2024

charliermarsh commented Apr 30, 2024

zanieb commented Apr 30, 2024 •

edited

Loading

ibraheemdev commented Apr 30, 2024

ibraheemdev commented Apr 30, 2024

Run resolve/install benchmarks in ci #3281

Run resolve/install benchmarks in ci #3281

Conversation

ibraheemdev commented Apr 26, 2024 • edited Loading

Summary

codspeed-hq bot commented Apr 26, 2024 • edited Loading

CodSpeed Performance Report

Congrats! CodSpeed is installed 🎉

Detected benchmarks

ibraheemdev commented Apr 26, 2024

adriencaccia commented Apr 26, 2024 • edited Loading

ibraheemdev commented Apr 26, 2024

charliermarsh left a comment

Choose a reason for hiding this comment

adriencaccia commented Apr 30, 2024

charliermarsh commented Apr 30, 2024

zanieb commented Apr 30, 2024

zanieb commented Apr 30, 2024

ibraheemdev commented Apr 30, 2024

adriencaccia commented Apr 30, 2024

ibraheemdev commented Apr 30, 2024

adriencaccia commented Apr 30, 2024 • edited Loading

adriencaccia commented Apr 30, 2024

charliermarsh commented Apr 30, 2024

zanieb commented Apr 30, 2024 • edited Loading

ibraheemdev commented Apr 30, 2024

ibraheemdev commented Apr 30, 2024

ibraheemdev commented Apr 26, 2024 •

edited

Loading

codspeed-hq bot commented Apr 26, 2024 •

edited

Loading

adriencaccia commented Apr 26, 2024 •

edited

Loading

adriencaccia commented Apr 30, 2024 •

edited

Loading

zanieb commented Apr 30, 2024 •

edited

Loading