Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Measure how efficient with_label_values() is #7568

Merged
merged 9 commits into from
Sep 7, 2022

Conversation

nikurt
Copy link
Contributor

@nikurt nikurt commented Sep 6, 2022

cc: @matklad

% cargo bench -p near-o11y
   Compiling near-o11y v0.0.0 (/storage/code/nearcore-master/core/o11y)
    Finished bench [optimized] target(s) in 1.05s
     Running unittests src/lib.rs (target/release/deps/near_o11y-ccbb448e66a5a4d9)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running benches/metrics.rs (target/release/deps/metrics-d531efabdee84e90)

running 3 tests
test inc_counter_vec_cached            ... bench:          21 ns/iter (+/- 1)
test inc_counter_vec_cached_str        ... bench:         183 ns/iter (+/- 1)
test inc_counter_vec_with_label_values ... bench:         506 ns/iter (+/- 10)

test result: ok. 0 passed; 0 failed; 0 ignored; 3 measured

Time per inc() call when caching counters: 0.050249520625 microseconds
Time per inc() call when caching label values: 0.624767583375 microseconds
Time per inc() call when caching strings: 0.659365745875 microseconds
Time per inc() call when using `with_label_values()`: 0.87889163825 microseconds
@nikurt
Copy link
Contributor Author

nikurt commented Sep 6, 2022

cc: @matklad

@nikurt nikurt marked this pull request as ready for review September 7, 2022 10:02
@nikurt nikurt requested a review from a team as a code owner September 7, 2022 10:02
@nikurt nikurt requested a review from mm-near September 7, 2022 10:02
@matklad
Copy link
Contributor

matklad commented Sep 7, 2022

Do we actually want to merge it? I am skeptical of the value of benchmarks, as we don't actually run them continuously. I think the right approach is to do ad-hoc investigation, document the outcomes at a point in time in some issue, link the code there, but don't get it into master.

No particularly strong opinion though.

@nagisa
Copy link
Collaborator

nagisa commented Sep 7, 2022

NB: this comment applies in general.

The unfortunate outcome of not documenting any results of an investigation is that the outcomes of said investigation become a folklore and are eventually lost to time. For investigations that are trivial to reproduce it might be fine, but ultimately some people leave, others forget details and the same mistake that prompted the investigation in the first place will be made again.

While it isn’t strictly necessary to merge benchmarks per se, I think it is worthwhile to spend some time to at least document what the outcome of this investigation is. Should we use a specific pattern out of these three when implementing metrics? Add an example/recommendations to near_o11y or something at least. Are all the variants fast enough for our use-cases? Make a note somewhere. That way, even if we don’t recall the details, maybe somebody will recall seeing the note and will be able to bring it up and then follow the breadcrumbs from there.

@mina86
Copy link
Contributor

mina86 commented Sep 7, 2022

I’ve taken the liberty of adding two more cases.

running 5 tests
test inc_counter_vec_cached                        ... bench:   15 ns/iter (+/- 0)
test inc_counter_vec_cached_str                    ... bench:  139 ns/iter (+/- 1)
test inc_counter_vec_with_label_values             ... bench:  391 ns/iter (+/- 5)
test inc_counter_vec_with_label_values_smartstring ... bench:  352 ns/iter (+/- 9)
test inc_counter_vec_with_label_values_stack       ... bench:  261 ns/iter (+/- 4)

@mina86
Copy link
Contributor

mina86 commented Sep 7, 2022

From that I think a quick improvement would be to offer FormattedNum type which is constructed from an integer and formats it into internal buffer. The usage would be: with_label_values(&[&FormattedNum::new(shard_id)]]).

@mina86
Copy link
Contributor

mina86 commented Sep 7, 2022

OK, one more. I always suspected Rust’s format is shite:

test inc_counter_vec_cached                            ... bench:   15 ns/iter (+/- 0)
test inc_counter_vec_cached_str                        ... bench:  143 ns/iter (+/- 3)
test inc_counter_vec_with_label_values                 ... bench:  397 ns/iter (+/- 8)
test inc_counter_vec_with_label_values_smartstring     ... bench:  344 ns/iter (+/- 5)
test inc_counter_vec_with_label_values_stack           ... bench:  264 ns/iter (+/- 14)
test inc_counter_vec_with_label_values_stack_no_format ... bench:  140 ns/iter (+/- 2)

@mina86
Copy link
Contributor

mina86 commented Sep 7, 2022

Can’t help myself… ;)

test inc_counter_vec_cached                            ... bench:   15 ns/iter (+/- 0)
test inc_counter_vec_cached_str                        ... bench:  140 ns/iter (+/- 1)
test inc_counter_vec_with_label_values                 ... bench:  388 ns/iter (+/- 5)
test inc_counter_vec_with_label_values_to_string       ... bench:  298 ns/iter (+/- 9)
test inc_counter_vec_with_label_values_smartstring     ... bench:  346 ns/iter (+/- 3)
test inc_counter_vec_with_label_values_stack           ... bench:  262 ns/iter (+/- 7)
test inc_counter_vec_with_label_values_stack_no_format ... bench:  137 ns/iter (+/- 3)

Mostly I was curious about overhead of format!("{val}") compared to val.to_string(). Turns out it’s noticeable.

@matklad
Copy link
Contributor

matklad commented Sep 7, 2022

Couple of crates to be aware of in this area:

@nikurt
Copy link
Contributor Author

nikurt commented Sep 7, 2022

outcome of not documenting any results of an investigation

This PR is an ongoing investigation, not a result of an investigation yet.
@nagisa, Based on the alternatives and benchmark results, I will document the results and make recommendations.

@mina86's test cases

Thank you! I simply wanted to compare the worst case .with_label_values() with the best case (cached). The plan was to investigate the alternatives and benchmark them.
@mina86 happened to suggest a bunch of alternatives already, thank you very much!

@near-bulldozer near-bulldozer bot merged commit 4c32bf0 into master Sep 7, 2022
@near-bulldozer near-bulldozer bot deleted the nikurt-label-values branch September 7, 2022 13:41
mina86 added a commit to mina86/nearcore that referenced this pull request Sep 7, 2022
As per benchmarks in near#7568, using
format! as opposed to calling to_string directly has ~100ns overhead.
There’s no reason not to get rid of that overhead especially since
using to_string is actually shorter to type.

Note that it may be beneficial to further optimise integer formatting
by using fixed-size buffer and custom conversion which doesn’t use
std.  This optimisation is outside the scope of this commit.
mina86 added a commit to mina86/nearcore that referenced this pull request Sep 7, 2022
As per benchmarks in near#7568, using
format! as opposed to calling to_string directly has ~100ns overhead.
There’s no reason not to get rid of that overhead especially since
using to_string is actually shorter to type.

Note that it may be beneficial to further optimise integer formatting
by using fixed-size buffer and custom conversion which doesn’t use
std.  This optimisation is outside the scope of this commit.
nikurt added a commit that referenced this pull request Sep 7, 2022
cc: @matklad 

```
% cargo bench -p near-o11y
   Compiling near-o11y v0.0.0 (/storage/code/nearcore-master/core/o11y)
    Finished bench [optimized] target(s) in 1.05s
     Running unittests src/lib.rs (target/release/deps/near_o11y-ccbb448e66a5a4d9)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running benches/metrics.rs (target/release/deps/metrics-d531efabdee84e90)

running 3 tests
test inc_counter_vec_cached            ... bench:          21 ns/iter (+/- 1)
test inc_counter_vec_cached_str        ... bench:         183 ns/iter (+/- 1)
test inc_counter_vec_with_label_values ... bench:         506 ns/iter (+/- 10)

test result: ok. 0 passed; 0 failed; 0 ignored; 3 measured
```
near-bulldozer bot pushed a commit that referenced this pull request Sep 7, 2022
As per benchmarks in #7568, using
format! as opposed to calling to_string directly has ~100ns
overhead. There’s no reason not to get rid of that overhead especially
since using to_string is actually shorter to type.

Note that it may be beneficial to further optimise integer formatting
by using fixed-size buffer and custom conversion which doesn’t use
std.  This optimisation is outside the scope of this commit.
nikurt added a commit that referenced this pull request Nov 9, 2022
cc: @matklad 

```
% cargo bench -p near-o11y
   Compiling near-o11y v0.0.0 (/storage/code/nearcore-master/core/o11y)
    Finished bench [optimized] target(s) in 1.05s
     Running unittests src/lib.rs (target/release/deps/near_o11y-ccbb448e66a5a4d9)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running benches/metrics.rs (target/release/deps/metrics-d531efabdee84e90)

running 3 tests
test inc_counter_vec_cached            ... bench:          21 ns/iter (+/- 1)
test inc_counter_vec_cached_str        ... bench:         183 ns/iter (+/- 1)
test inc_counter_vec_with_label_values ... bench:         506 ns/iter (+/- 10)

test result: ok. 0 passed; 0 failed; 0 ignored; 3 measured
```
nikurt pushed a commit that referenced this pull request Nov 9, 2022
As per benchmarks in #7568, using
format! as opposed to calling to_string directly has ~100ns
overhead. There’s no reason not to get rid of that overhead especially
since using to_string is actually shorter to type.

Note that it may be beneficial to further optimise integer formatting
by using fixed-size buffer and custom conversion which doesn’t use
std.  This optimisation is outside the scope of this commit.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants