bitswap: peer prom tacker #413

Jorropo · 2023-07-20T14:53:34Z

Fixes #209

Peer Prom Tracker example
Tracer tests, I think the client's tracer is bugged and it does not record outbound messages
Changelog

For #209 Need examples.

Need tests (I think the client's tracer is bugged and it does not record outbound messages).

codecov · 2023-07-20T14:59:13Z

Codecov Report

Merging #413 (b2cef5d) into main (cfad09d) will decrease coverage by 0.21%.
The diff coverage is 0.00%.

@@            Coverage Diff             @@
##             main     #413      +/-   ##
==========================================
- Coverage   49.61%   49.41%   -0.21%     
==========================================
  Files         248      249       +1     
  Lines       29838    29945     +107     
==========================================
- Hits        14805    14798       -7     
- Misses      13615    13726     +111     
- Partials     1418     1421       +3

Impacted Files	Coverage Δ
bitswap/bitswap.go	`37.20% <0.00%> (ø)`
bitswap/client/client.go	`82.21% <0.00%> (-2.38%)`	⬇️
bitswap/options.go	`0.00% <0.00%> (ø)`
bitswap/peer-prom-tracker/peer-prom-tracker.go	`0.00% <0.00%> (ø)`
bitswap/server/server.go	`52.19% <0.00%> (ø)`

... and 9 files with indirect coverage changes

lidel · 2023-07-24T13:58:05Z

bitswap/peer-prom-tracker/peer-prom-tracker.go

+		}
+	}()
+
+	peerIdLabel := []string{"peer-id"}


⚠️ This will create a separate time series per PeerID, which is ok for debugging, but should NOT be enabled by default.

High cardinality labels in prometheus are is considered antipattern. If there was 20K of peers, this will create 20K time series, and that may cause problems (performance, billing) when Grafana tries to visualize it.

To understand why high cardinality is a problem, see:

https://stackoverflow.com/questions/46373442/how-dangerous-are-high-cardinality-labels-in-prometheus

https://grafana.com/blog/2022/10/20/how-to-manage-high-cardinality-metrics-in-prometheus-and-kubernetes/

IMO this PR can't land in boxo in this form as it creates footgun for users of this library.

There needs to be either a hard-limit on the number of peers tracked, or an explicit opt-in via constructor option or ENV variable.

💭 I think if we wanted to have metrics similar to this, we could measure P95 globally without running into the cardinality problem.

To do so, one would define Objectives in SummaryOpts to be P50, P75, P95 etc, and calculate messages-received, messages-sent, bytes-received etc across all peers, not specific per peer. This way we get useful P95 metric with known error margin, without exploding the time series.

Jorropo · 2023-07-27T09:57:28Z

This was comunicated as not needed and we don't want to run it in production in Kubo due to the cardinality issue.
So I'll close this an leave the branch in case it comes back up later.

Jorropo added 2 commits July 20, 2023 16:51

bitswap: add peer-prom-tracker

d9abb63

For #209 Need examples.

bitswap: Allow to register multiple tracers

b2cef5d

Need tests (I think the client's tracer is bugged and it does not record outbound messages).

Jorropo requested a review from a team as a code owner July 20, 2023 14:53

BigLep assigned Jorropo Jul 20, 2023

BigLep mentioned this pull request Jul 20, 2023

Bitswap: better diagnostics and observability on client #209

Closed

lidel requested changes Jul 24, 2023

View reviewed changes

Jorropo changed the title ~~Bitswap introspectability~~ bitswap: peer prom tacker Jul 26, 2023

Jorropo closed this Jul 27, 2023

BigLep mentioned this pull request Aug 3, 2023

Release 0.22 ipfs/kubo#9911

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bitswap: peer prom tacker #413

bitswap: peer prom tacker #413

Jorropo commented Jul 20, 2023

codecov bot commented Jul 20, 2023 •

edited

Loading

lidel Jul 24, 2023

lidel Jul 24, 2023 •

edited

Loading

Jorropo commented Jul 27, 2023

bitswap: peer prom tacker #413

bitswap: peer prom tacker #413

Conversation

Jorropo commented Jul 20, 2023

codecov bot commented Jul 20, 2023 • edited Loading

Codecov Report

lidel Jul 24, 2023

Choose a reason for hiding this comment

lidel Jul 24, 2023 • edited Loading

Choose a reason for hiding this comment

Jorropo commented Jul 27, 2023

codecov bot commented Jul 20, 2023 •

edited

Loading

lidel Jul 24, 2023 •

edited

Loading