context/tracing in the blockstore/datastore pipeline #6803

MichaelMure · 2019-12-17T10:33:11Z

At Infura, we have to deal with the occasional performance issue. Even though we now have added tracing in the HTTP handler of go-ipfs-cmds (not upstreamed yet, it's opentracing and you are aiming for opencensus if I'm not mistaken?), the tracing essentially stop there.

The majority of the requests (and the perf issues) involve the data pipeline but it is essentially a black box. To resolve that problem, a proper tracing instrumentation there would be very helpful, but that imply adding a go context to most if not all blockstore and datastore functions.

Is that something you would be interested to pursue ?

cc @dirkmc maybe ?

The text was updated successfully, but these errors were encountered:

Stebalien · 2019-12-18T08:57:30Z

We should be passing contexts through to the datastores but the refactor is going to be quite large and invasive and will affect multiple projects (ipfs, libp2p, filecoin, and probably quite a few more).

We should look into this next year but I don't have time right now to handle all the fallout of this refactor.

MichaelMure · 2020-07-22T09:55:32Z

master issue:

Datastore:

go-datastore: wire a context in most of the Datastore methods go-datastore#161
go-ds-flatfs: wire a context in most of the Datastore methods go-ds-flatfs#85
go-ds-badger: wire a context in most of the Datastore methods go-ds-badger#97
go-ds-leveldb: wire a context in most of the Datastore methods go-ds-leveldb#45
go-ds-s3:
go-ds-crdt:
go-ds-sql:
go-ds-measure: wire a context in most Datastore methods go-ds-measure#30
go-ds-redis:
go-ds-swift:
go-ds-bolt:
go-ds-badger2:
go-ds-bitcask:
go-ds-bench:

Blockstore:

go-ipfs-blockstore: wire a context in most of the Blockstore/Datastore methods go-ipfs-blockstore#55

IPFS:

go-ipfs: wire a context in most of the data pipeline, connect it #7558
go-ipfs-provider: wire a context in most of the data pipeline go-ipfs-provider#27 TODO TEST BROKEN
go-ipfs-routing: wire and connect a context in go-datastore go-ipfs-routing#23
go-blockservice: wire a context in the data pipeline go-blockservice#66
go-bitswap: wire a context in most of Datastore/Blockstore methods, connect a few go-bitswap#421
go-filestore: wire a context in most of the data pipeline go-filestore#36
go-graphsync: context.TODO() a few places that now require a context go-graphsync#78
go-ipfs-exchange-offline: add a context to HasBlock, pass one to the datastore go-ipfs-exchange-offline#30
go-ipfs-exchange-interface: add a context to HasBlock to accomodate a context in Blockstore/Datastore go-ipfs-exchange-interface#11
go-merkledag: wire a context in most of the data pipeline, connect them go-merkledag#57
go-ipfs-pinner: wire a context in most of the ipfs data pipeline go-ipfs-pinner#10

libp2p:

go-libp2p-kad-dht: wire a context in most of the ipfs data pipeline, connect it libp2p/go-libp2p-kad-dht#683
go-libp2p-pubsub-router: wire a context in most of the go-ipfs data pipeline, connect them libp2p/go-libp2p-pubsub-router#78

MichaelMure · 2020-07-22T10:02:30Z

Overall it went fairly well with just go-graphsync to be worked around with `context.TODO().

Obviously this will need to be merged progressively, tagged and updated on the upper layers, but it does compile for me and seems to be working from a quick test.

BigLep · 2021-05-03T22:54:19Z

@MichaelMure : We're sorry this has been open for so long. Do you think you'd be available to work with a PL engineer later in May get this updated and pushed? We'll figure out the exact schedule if you're available, but wanted to see if that is possible. Thanks!

MichaelMure · 2021-05-03T23:10:45Z

@BigLep I might be but I'll have to see how things evolve on my side.

However I'm not sure I would be so useful: the problem here is not technical. Ignoring go-graphsync for a minute, coding wise it's actually fairly easy. It took me a few hours to code and open all those PRs. The few packages I didn't touch are very likely to be as easy.

The problem here is the coordination of all the project and their maintainers to land those changes without too much complications and someone to orchestrate that effort. It's not something I can do as an outsider.

MichaelMure · 2021-05-04T09:30:03Z

@BigLep that said, let me reiterate that this would be a big win for us, but not only. We are often facing performance issue or simply something not working right, and go-ipfs being a black box make it hard to understand and fix. Tracing has proved to be invaluable in other part of our stack to provide a quality service and react quickly on incidents. This is likely the feature that would help us the most.

In addition, this would allow us to give you a better feedback from our deployed infrastructure, like pin-pointing precisely a performance issue. It's always easier to design or fix things with hard numbers.

BigLep · 2021-05-07T22:34:09Z

@MichaelMure : thanks! The need makes sense and agreed that a key dependency is the coordination from a maintainer to land these. I'm hoping we secure a time later in the month with someone like @aschmahmann who will do the merging but can also have you on hand if there rebasing/updates that need to be done along the way. Does that make sense (and please let me know if that's not an effective strategy)?

BigLep · 2021-09-28T21:53:12Z

@MichaelMure : quick update here that I'm trying to see about prioritizing this issue sooner as PL is focusing on its own efforts to improve the Gateways it's operating. @guseggert is a newer team member who may assign this to. Could you two connect and swap notes about the intended ways this will be helpful? I'll use that info for making the prioritization case.

MichaelMure · 2021-09-29T10:08:32Z

Sure, here is a few things where that could be helpful:

tracing/observability

This would be a massive help for running IPFS in production. Really, I can't overstate that. At the moment, go-ipfs is basically a black box: requests comes in, response comes out. What happen in the middle is hard to observe. There is prometheus metrics, the diag command or pprof but that's really surface level. This leads to difficulties to address emergency situations, understand and plan for the future.

The most critical subsystem in go-ipfs for performance and day to day operation is the data pipeline. Having a go context in there would allow to gradually and possibly independently add tracing instrumentation. The benefits would be:

full visibility into operational activities: timing, request count, load, delay, inter-dependencies ...
ability to slice and dice data, isolate specific requests or class of requests, figure out patterns ...
track errors and problems, isolate their origin and resolve issue in a way simpler and efficient manner

Note also that this tracing would not necessarily be limited to go-ipfs: distributed tracing allow to propagate this tracing over the boundaries of connected systems (proxy, backend ...) which gives another dimension of observability.

handling cancellation / reliability

At the moment, there is no cancellation in the data pipeline. This means that once a request is started, nothing will stop it, even if the original request is gone. Fixing that would allow to trim that unnecessary fat, reduce the load and improve reliability. It might also prevent a form of attack where heavy handling is triggered with minimal effort.

request tagging / custom handling

Another way this could be helpful is that the go context allow to carry metadata about the request, independently of all the layers it's going through. This means that one could for example carry over a domain specific logger or tag a request with an origin or some customer information. When reaching a lower level, those information could be used to prioritize requests, do caching differently, route to a specialized backend ...

Importantly, this mechanism allows to extend the system independently without being bound by the Protocol Labs road map. This would lower your burdens and allow to explore more areas.

engineering feedback / optimisation

In my experience, each time observability improve, new issues are unearthed. Proper instrumentation would give node operator and PL the tooling to discover those long standing performance/reliability issues. 95% of the work in software engineering is to figure out where the problem comes from. Once that's done and understood, fixing stuff becomes easy.

node operator autonomy

Observability would allow node operator to more easily figure out what is happening and in turn, rely less on PL to diagnose those issues and reduce the burden on the development team.

Also discussed a bit at ipfs/roadmap#74

BigLep · 2021-09-29T16:19:27Z

Well stated @MichaelMure . PL will respond back by EOD 2021-10-01.

BigLep · 2021-10-05T04:29:12Z

I missed not circling back on this last week since we did discuss it internally. @guseggert and team are going to pick this up. We're aiming to get this in the next go-ipfs release 0.11. The first step is to make the plan of how we can merge this in a progressive way.

BigLep · 2021-10-12T20:04:17Z

Will get more of the plan public, but internal scratchpad for thoughts on rolling this out is happening here: https://www.notion.so/protocollabs/Context-Plumbing-2b9fccf60db34ecb980b3068cabb9d50

guseggert · 2021-10-13T19:08:37Z

Update: I'm plumbing these changes through as pseudoversions on branches, to make sure it all works before publishing any new versions.

I will probably add some contexts in additional places, e.g. there are some interfaces in go-datastore that should have contexts too like CheckedDatastore, ScrubbedDatastore, GCDatastore, etc.

Here's the order to update the modules:

github.com/ipfs/go-datastore
github.com/ipfs/go-ds-badger
github.com/ipfs/go-ds-leveldb
github.com/libp2p/go-libp2p-peerstore
github.com/libp2p/go-libp2p-swarm
github.com/libp2p/go-libp2p-autonat
github.com/libp2p/go-libp2p-circuit
github.com/libp2p/go-libp2p-discovery
github.com/libp2p/go-libp2p
github.com/libp2p/go-libp2p-noise
github.com/ipfs/go-ipfs-ds-help
github.com/ipfs/go-ipfs-blockstore
github.com/ipfs/go-ipfs-exchange-interface
github.com/ipfs/go-ipfs-routing
github.com/ipfs/go-bitswap
github.com/ipfs/go-ipfs-exchange-offline
github.com/ipfs/go-blockservice
github.com/ipfs/go-merkledag
github.com/ipfs/go-unixfs
github.com/ipfs/go-fetcher
github.com/ipfs/go-unixfsnode
github.com/libp2p/go-libp2p-kbucket
github.com/ipfs/go-path
github.com/ipfs/go-ipns
github.com/libp2p/go-libp2p-xor
github.com/ipfs/interface-go-ipfs-core
github.com/libp2p/go-libp2p-kad-dht
github.com/libp2p/go-libp2p-gostream
github.com/libp2p/go-libp2p-pubsub
github.com/ipfs/go-ds-flatfs
github.com/ipfs/go-ds-measure
github.com/ipfs/go-filestore
github.com/ipfs/go-graphsync
github.com/ipfs/go-ipfs-config
github.com/ipfs/go-ipfs-pinner
github.com/ipfs/go-ipfs-provider
github.com/ipfs/go-mfs
github.com/ipfs/go-namesys
github.com/ipld/go-car
github.com/libp2p/go-libp2p-http
github.com/libp2p/go-libp2p-pubsub-router
github.com/ipfs/go-ipfs

(script to generate the order: https://gist.githubusercontent.com/guseggert/fe079f793cbea3158538bdaa9f50878b/raw/d87c0ef9f1593dd7ce9acb0b38e003e9f455ba88/gistfile1.txt)

guseggert · 2021-10-19T15:30:37Z

Update: currently working through libp2p/go-libp2p-swarm as it depends on an older version of go-libp2p-peerstore, and the newer version has non-trivial backwards-incompatible changes.

(This module was not originally in the list, it was added after I fixed the script to generate the list.)

BigLep · 2021-10-26T16:08:55Z

2021-10-26 note:

@guseggert will update this
We'll link the new PRs
We'll close out the old ones

guseggert · 2021-10-26T19:25:31Z

I've plumbed the changes through using pseudoversions on feat/context branches through all the repos, and fixed the issues that came up. I've added contexts to quite a bit more interfaces, so I re-did the plumbing work. Now I'm beginning to cut releases and plumb those through. Libp2p has some hole punching changes that are in-flight, and there are also sharding changes in-flight that could cause issues with the rollout here--if I run into any problems, I'm going to plumb through a pseudoversion instead of a release version, and that can be cleaned up separately after the issues are resolved.

BigLep · 2021-10-26T19:32:09Z

Thanks for the update @guseggert ! A couple of things I think would be useful for visibility when you can get to them:

A list of all the repos we're going to touch (and in what order) with checkboxes to show current status?
The PRs next to the list as they get created

guseggert · 2021-10-27T18:35:23Z

I am also adding the versioning workflows to all of these repos, which is taking some time to roll out (rerunning flaky tests, approving PRs, etc.).

Here are the repos (ordered):

github.com/ipfs/go-datastore
- feat: add context to interfaces go-datastore#181
- Bump version to 0.5.0 go-datastore#183
github.com/ipfs/go-ds-badger
- feat: plumb through contexts go-ds-badger#119
- no version bump PR, had to tag by hand (see sync: update CI config files go-ds-badger#113)
github.com/ipfs/go-ds-leveldb
- Plumb through context changes go-ds-leveldb#57
- Version 0.5.0 go-ds-leveldb#58
github.com/libp2p/go-libp2p-peerstore
- feat: plumb through datastore contexts libp2p/go-libp2p-peerstore#176
- Bump version to 0.4.0 libp2p/go-libp2p-peerstore#178
github.com/libp2p/go-libp2p-kad-dht
- feat: plumb through datastore contexts libp2p/go-libp2p-kad-dht#753
- Bump version to 0.15.0 libp2p/go-libp2p-kad-dht#755
github.com/libp2p/go-libp2p-swarm
- feat: plumb contexts through from peerstore libp2p/go-libp2p-swarm#290
- release version 0.8.0 libp2p/go-libp2p-swarm#292
github.com/libp2p/go-libp2p-autonat
- feat: plumb through contexts from peerstore libp2p/go-libp2p-autonat#111
- release version 0.6.0 libp2p/go-libp2p-autonat#112
github.com/libp2p/go-libp2p-circuit
- feat: plumb through contexts from peerstore libp2p/go-libp2p-circuit#145
github.com/libp2p/go-libp2p-discovery
- feat: plumb peerstore contexts changes through libp2p/go-libp2p-discovery#75
- version tagged by hand https://github.com/libp2p/go-libp2p-discovery/releases/tag/v0.6.0
github.com/libp2p/go-libp2p
- feat: plumb through peerstore context changes libp2p/go-libp2p#1237
github.com/ipfs/go-ipfs-ds-help
- feat: plumb through datastore context changes go-ipfs-ds-help#36
- Update version.json go-ipfs-ds-help#38
github.com/ipfs/go-ipfs-blockstore
- v0
  - feat: add context to interfaces & plumb through datastore contexts go-ipfs-blockstore#89
  - no version bump PR, tagged by hand (see https://github.com/ipfs/go-ipfs-blockstore/releases/tag/v0.2.0)
- v1
  - feat: add context to interfaces go-ipfs-blockstore#90
  - Version 1.1.0 go-ipfs-blockstore#91
github.com/ipfs/go-ipfs-exchange-interface
- feat: add context to interface go-ipfs-exchange-interface#18
- Update version.json go-ipfs-exchange-interface#20
github.com/ipfs/go-ipfs-routing
- feat: plumb through context changes go-ipfs-routing#28
- Bump version to 0.2.0 go-ipfs-routing#29
github.com/ipfs/go-bitswap
- feat: plumb through contexts go-bitswap#539
- Version 0.5.0 go-bitswap#540
github.com/ipfs/go-ipfs-exchange-offline
- feat: plumb through contexts go-ipfs-exchange-offline#42
- release version 0.1.0 go-ipfs-exchange-offline#43
github.com/ipfs/go-blockservice
- feat: add context to interfaces go-blockservice#86
- Version 0.2.0 go-blockservice#87
github.com/ipfs/go-merkledag
- feat: plumb through contexts go-merkledag#78
- Version 0.5.0 go-merkledag#79
github.com/ipfs/go-unixfs
- feat: plumb through datastore context changes go-unixfs#113
- Version 0.3.0 go-unixfs#114
github.com/ipfs/go-fetcher
- feat: plumb through context changes go-fetcher#28
- Version 1.6.0 go-fetcher#29
github.com/ipfs/go-unixfsnode
github.com/libp2p/go-libp2p-kbucket
github.com/ipfs/go-path
- feat: plumb through context changes go-path#47
- Version 0.2.0 go-path#48
github.com/ipfs/go-ipns
github.com/libp2p/go-libp2p-xor
github.com/libp2p/go-libp2p-gostream
github.com/libp2p/go-libp2p-pubsub
- feat: plumb through context changes libp2p/go-libp2p-pubsub#459
- manual release https://github.com/libp2p/go-libp2p-pubsub/releases/tag/v0.6.0
github.com/ipfs/go-ds-flatfs
- feat: add contexts on datastore methods go-ds-flatfs#98
- Version 0.5.0 go-ds-flatfs#99
github.com/ipfs/go-ds-measure
- feat: add contexts on datastore methods go-ds-measure#38
- Version 0.2.0 go-ds-measure#39
github.com/ipfs/go-filestore
- feat: plumb through context changes go-filestore#55
- note to self: this had a mv bump to v1.0.0 and likely needs to be backported to v0
github.com/ipfs/go-graphsync
github.com/ipfs/interface-go-ipfs-core
github.com/ipfs/go-ipfs-config
github.com/ipfs/go-ipfs-pinner
- feat: plumb through context changes go-ipfs-pinner#18
- manual release https://github.com/ipfs/go-ipfs-pinner/releases/tag/v0.2.0
github.com/ipfs/go-ipfs-provider
- feat: plumb through datastore contexts go-ipfs-provider#39
- manual release https://github.com/ipfs/go-ipfs-provider/releases/tag/v0.7.0
github.com/ipfs/go-mfs
- support threshold based automatic sharding and unsharding of directories go-mfs#88 (I tacked on the changes to this PR)
- Version 0.2.0 go-mfs#96
github.com/ipfs/go-namesys
github.com/ipld/go-car
github.com/libp2p/go-libp2p-http
github.com/libp2p/go-libp2p-pubsub-router
github.com/ipfs/go-ipfs

guseggert · 2021-11-12T19:41:13Z

I've finished the majority of the plumbing, the rest are blocked on two things:

release of go-libp2p version 0.16.0-rc.1 (to plumb through the changes to go-libp2p)
releasing the sharding work in Tracking issue for UnixFS automatic sharding #8106 (the sharding work contains breaking changes that need to be merged with this)

Once those are resolved, I can complete the plumbing and we will be ready to release go-ipfs v0.11.0-RC1

guseggert · 2021-11-17T19:01:35Z

I discovered yesterday that go-ipfs-blockstore is using the wrong version of go-ipfs-ds-help, it was inadvertently upgraded to v1, so I need to go back and downgrade go-ipfs-blockstore@v0 to use go-ipfs-ds-help@v0 and then re-plumb.

aschmahmann · 2021-11-29T22:09:13Z

closed by #8563

MichaelMure added the kind/enhancement A net-new feature or improvement to an existing feature label Dec 17, 2019

MichaelMure mentioned this issue Jul 22, 2020

wire a context in most of the ipfs data pipeline ipfs/go-ipfs-pinner#3

Closed

MichaelMure mentioned this issue Nov 30, 2020

[2021 Theme Proposal] Better observability ipfs/roadmap#74

Closed

BigLep modified the milestones: go-ipfs 0.13, go-ipfs 0.11 Oct 5, 2021

BigLep assigned guseggert Oct 5, 2021

BigLep mentioned this issue Oct 19, 2021

Bubble up changes to ipfs/go-datastore filecoin-project/lotus#7536

Closed

jsimnz mentioned this issue Nov 16, 2021

Upgrade Datastore to support context sourcenetwork/defradb#44

Closed

This was referenced Nov 19, 2021

update to context datastores ipfs/go-graphsync#275

Merged

feat: plumb through datastore contexts libp2p/go-libp2p-pubsub-router#89

Merged

feat: plumb through datastore contexts ipfs/go-namesys#25

Merged

guseggert mentioned this issue Nov 23, 2021

Release v0.11 #8343

Closed

80 tasks

aschmahmann closed this as completed Nov 29, 2021

guseggert mentioned this issue Dec 2, 2021

Add comprehensive tracing #8578

Open

9 tasks

aschmahmann mentioned this issue Dec 3, 2021

v0.11.0-rc1 underreports RepoSize significantly #8579

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

context/tracing in the blockstore/datastore pipeline #6803

context/tracing in the blockstore/datastore pipeline #6803

MichaelMure commented Dec 17, 2019

Stebalien commented Dec 18, 2019

MichaelMure commented Jul 22, 2020 •

edited by gammazero

Loading

MichaelMure commented Jul 22, 2020

BigLep commented May 3, 2021

MichaelMure commented May 3, 2021 •

edited

Loading

MichaelMure commented May 4, 2021

BigLep commented May 7, 2021 •

edited

Loading

BigLep commented Sep 28, 2021

MichaelMure commented Sep 29, 2021

BigLep commented Sep 29, 2021

BigLep commented Oct 5, 2021

BigLep commented Oct 12, 2021

guseggert commented Oct 13, 2021 •

edited

Loading

guseggert commented Oct 19, 2021

BigLep commented Oct 26, 2021

guseggert commented Oct 26, 2021

BigLep commented Oct 26, 2021

guseggert commented Oct 27, 2021 •

edited

Loading

guseggert commented Nov 12, 2021

guseggert commented Nov 17, 2021

aschmahmann commented Nov 29, 2021

context/tracing in the blockstore/datastore pipeline #6803

context/tracing in the blockstore/datastore pipeline #6803

Comments

MichaelMure commented Dec 17, 2019

Stebalien commented Dec 18, 2019

MichaelMure commented Jul 22, 2020 • edited by gammazero Loading

MichaelMure commented Jul 22, 2020

BigLep commented May 3, 2021

MichaelMure commented May 3, 2021 • edited Loading

MichaelMure commented May 4, 2021

BigLep commented May 7, 2021 • edited Loading

BigLep commented Sep 28, 2021

MichaelMure commented Sep 29, 2021

tracing/observability

handling cancellation / reliability

request tagging / custom handling

engineering feedback / optimisation

node operator autonomy

BigLep commented Sep 29, 2021

BigLep commented Oct 5, 2021

BigLep commented Oct 12, 2021

guseggert commented Oct 13, 2021 • edited Loading

guseggert commented Oct 19, 2021

BigLep commented Oct 26, 2021

guseggert commented Oct 26, 2021

BigLep commented Oct 26, 2021

guseggert commented Oct 27, 2021 • edited Loading

guseggert commented Nov 12, 2021

guseggert commented Nov 17, 2021

aschmahmann commented Nov 29, 2021

MichaelMure commented Jul 22, 2020 •

edited by gammazero

Loading

MichaelMure commented May 3, 2021 •

edited

Loading

BigLep commented May 7, 2021 •

edited

Loading

guseggert commented Oct 13, 2021 •

edited

Loading

guseggert commented Oct 27, 2021 •

edited

Loading