Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IPFS Bandwidth Metrics and Logging enhancements #14

Closed
obo20 opened this issue Dec 17, 2018 · 24 comments
Closed

IPFS Bandwidth Metrics and Logging enhancements #14

obo20 opened this issue Dec 17, 2018 · 24 comments
Labels

Comments

@obo20
Copy link

obo20 commented Dec 17, 2018

As my company continues to expand the functionality of our pinning service, I wanted to start a discussion around one of the things that we've found relevant to our roadmap: Metrics and Logging.

Current cloud providers like Digital Ocean / AWS have the ability to charge you based on two things:

  1. How much data you've stored
  2. The bandwidth your node has used for the data you've stored.

With IPFS, you can only really keep track of how much data is being stored, as it's quite difficult to see what kind of usage / traffic is going through your node on a per content basis. I've found a few hacky solutions involving directly monitoring the DHT / logs, but nothing that seems like it would work well in production.

This presents a problem when users want to store content that may have high bandwidth usage (videos, photos, websites). This forces us to bundle storage / estimated bandwidth costs together, which isn't an optimal solution for a multitude of reasons.

The simplest, first pass solution I've came up with is giving IPFS the ability to keep a running tally of how many times hashes on your node have been requested / delivered.

Nice Quality of Life features for this solution would be:

  1. The ability to configure this feature so that your node can have a few different logging strategies such as: Logging for only files you've pinned, logging for only root hashes, etc.
  2. The ability to query these counts directly via the IPFS API

I'd love to hear thoughts from the community on topic, and if anybody has any alternative solutions to the particular problem I mentioned.

@obo20 obo20 changed the title IPFS Metrics and Logging enhancements IPFS Bandwidth Metrics and Logging enhancements Dec 17, 2018
@daviddias
Copy link
Member

Hi @obo20, thank you for reviewing the IPFS Project Roadmap and highlighting this need!

The latest IPFS WebUI shows in part what you can do with the bandwidth stats that you get out of IPFS today -- https://github.com/ipfs-shipyard/ipfs-webui#ipfs-web-ui -- using the stat API https://github.com/ipfs/interface-ipfs-core/blob/master/SPEC/STATS.md#statsbw. However I hear your feedback that it is hard to understand how much of that bandwidth pressure is caused by a specific file or files.

For each file, there are 2 types of traffic:

  • The exchange of the file which is governed by Bitswap
  • The "tell the network that we have it" which is governed by the Providers Service (which then uses the DHT).

Both of these piggy back on all the work libp2p does to establish connections, run Peer Discovery and so on which in itself is a good sizable chunk of the traffic.

We can consider making it part of the toolkit knowing how much bandwidth a file has been consuming (both fetching and providing to the network) from the Bitswap side.

As for the DHT part, given that spreading the provider records goes all together, getting a rigorous number might require a lot of CPU/Memory bound tasks that would be better off spending on handling the requests. That said, one alternative might be just dividing the bandwidth used by the DHT by each file that is being shared with regards to its size (bigger file, more records, larger slice of the bandwidth pie).

@obo20 how does this sound? @Stebalien, @alanshaw any extra thoughts?

@Stebalien
Copy link
Member

We can consider making it part of the toolkit knowing how much bandwidth a file has been consuming (both fetching and providing to the network) from the Bitswap side.

We can't really track an absolute "bandwidth consumption" number on a per-file basis as we'd need to track a separate number for every block we're storing (that's a lot of memory). However, we could probably track track active bandwidth (i.e., bandwidth consumed by a file averaged over the last 10s or something).

That said, one alternative might be just dividing the bandwidth used by the DHT by each file that is being shared with regards to its size (bigger file, more records, larger slice of the bandwidth pie).

I'm not sure how to accurately track this without some really invasive modifications.

@obo20
Copy link
Author

obo20 commented Dec 18, 2018

I don't imagine the type of functionality we're discussing is something that should be turned on by default. Rather, it would be turned on by nodes who explicitly need that information and are willing to take a slight performance hit to acquire it. We're also fine increasing our storage costs by a percentage to be able to store this information.

@daviddias You mention that the peer discovery / connection handling produces a sizable chunk of traffic. Could you elaborate on what you mean by sizable? I had imagined most of our bandwidth logging was going to revolve around purely the delivery of content itself and the connection / peer discovery work was just going to be part of the cost of running a node.

@Stebalien You say that tracking a separate number for every block would require a lot of memory.

Could you explain why we'd need to store these values in memory? This type of value seems like it should just be incremented in a Database.

Also, would it be possible to simply filter tracking of these statistics so that only root hash traffic is logged?

@Stebalien
Copy link
Member

Could you explain why we'd need to store these values in memory? This type of value seems like it should just be incremented in a Database.

I'm mostly worried about the disk IO. However, if it were off by default, that might be reasonable.

An alternative is to provide some metrics APIs where a user can subscribe to "events" (e.g., sent block X to Y size Z). We already kind of have this with structured logging (ipfs log tail) but that's a little ad-hoc.

@obo20
Copy link
Author

obo20 commented Dec 19, 2018

I had an interesting conversation with one of our users today that got me thinking of an alternative approach that I wanted to throw out purely for the sake of discussion. Full disclosure, this idea has not been fully explored for feasibility.

For awhile I've thought about the feasibility of segmenting off the IPFS data store. By that I mean allow the data store to have sub directories for specific buckets of content. So when I pin content to an IPFS node, I can also provide a "sub-store" name with that content.

My initial reasoning for desiring something like this was so that as a node owner I could restrict access to certain sub-sections of my data store to specific users (this would also require access control to be figured out for IPFS), but I suppose the base concept of sub-stores could maybe be used to track bandwidth of data by which sub-store it came from. In the bandwidth use case, each "sub-store" of the data store would be a user, and I could simply query my node each month to find out what that user's sub-store used in bandwidth.

I'd imagine that this might take quite a bit of effort to actually accomplish, but who knows, maybe it spurs some other ideas. Let me know if you'd like me to elaborate on this at all.

@obo20
Copy link
Author

obo20 commented Jan 2, 2019

Out of curiosity, does anybody know if the Filecoin project is concerned about the issue of bandwidth at all?

Without a proper way of managing bandwidth usage, Filecoin users who are paying to store low-bandwidth content will effectively be subsidizing users who are storing extremely high bandwidth content. (Unless the retrieval market aspect of Filecoin somehow solves this issue?)

I'd love to hear some perspective from the Filecoin team, as I'd imagine that any solutions that are being considered here will likely also be relevant to the Filecoin project.

@Stebalien
Copy link
Member

For awhile I've thought about the feasibility of segmenting off the IPFS data store. By that I mean allow the data store to have sub directories for specific buckets of content. So when I pin content to an IPFS node, I can also provide a "sub-store" name with that content.

That could definitely be useful for performance but is probably overkill for simple accounting (and would work against deduplication).

Out of curiosity, does anybody know if the Filecoin project is concerned about the issue of bandwidth at all?

Given the trustless nature of Filecoin, the retrieval market charges the user retrieving the file for bandwidth. If the retrieval miner charged the user storing the content, they (the retrieval miner) could arbitrarily inflate bandwidth usage and over charge.

However, you bring up a good point. Users storing content will likely want to pay some (trusted) service for bandwidth up-front. This service would effectively act as a pre-payed proxy/cache and would need some way to account for usage.


What if we did this through a plugin? go-ipfs supports plugins so we could add a plugin interface for hooking into bitswap. The plugin could be invoked for every block sent (with the CID of the block and the peer ID of the peer requesting the block).

You'd then be able to charge users by:

  1. Charging them based on bitswap bandwidth.
  2. Charging them per-block, both for storage and for background provider bandwidth. Given your use-case, you don't really need to track provider bandwidth exactly; you should be able to just estimate it and build it into the price of each block.

@obo20
Copy link
Author

obo20 commented Jan 4, 2019

What if we did this through a plugin? go-ipfs supports plugins so we could add a plugin interface for hooking into bitswap. The plugin could be invoked for every block sent (with the CID of the block and the peer ID of the peer requesting the block).

@Stebalien I like the concept. It seems like in theory it sounds like this could work.

To clarify, would this be something that gets logged to the bitswap ledger and we can query later, or is it something I would have to actively watch in real time? Also, could this querying be filtered by the usual block types ("direct", "recursive", "indirect", "all") ?

My ideal scenario would be having the ability to easily query bitswap and get back an array of multi-hashes and an integer representing the amount of times those blocks were sent. It would be cool if I could also easily get back the block size for quick multiplication, but that's something I could manage separately if needed. Or there could be a verbosity flag. Example Response:

[
    {
        hash: "Qm...example...hash"
        count: 11 (this is the number of times the block was sent),
        size: 11111,
        cumulativeSize: 11111111111
    },
    {
        hash: "Qm...example...hash...2"
        count: 22 (this is the number of times the block was sent),
        size: 22222,
        cumulativeSize: 22222222222
    },
    .
    .
    .
]

In regards to Peer ID, I worry this might be overkill to track this value in combination with the number of blocks sent. It might also be pretty nasty when it comes time to query things. Instead of having a list of Peer ID amounts + a separate list of block amounts (which would give P + B records), we would instead have a record for each Peer ID + Block Hash combination (which would be P * B records). However, this type of data would be pretty powerful, so if I'm missing something and this would actually be trivial to implement, let me know.

@Stebalien
Copy link
Member

To clarify, would this be something that gets logged to the bitswap ledger and we can query later, or is it something I would have to actively watch in real time?

My idea was simply to provide a hook that gets called every time a block is sent to a peer. Bitswap wouldn't remember anything, it would just invoke any hooks registered by plugins and move on.

Also, could this querying be filtered by the usual block types ("direct", "recursive", "indirect", "all") ?

Those are pinning concepts and don't really exist at the block/bitswap layer. At this layer, we just have a set of blocks and don't have any way to (efficiently) relate these blocks to pins.

@obo20
Copy link
Author

obo20 commented Jan 7, 2019

My idea was simply to provide a hook that gets called every time a block is sent to a peer. Bitswap wouldn't remember anything, it would just invoke any hooks registered by plugins and move on.

That makes sense. So the plugin itself would log the information, and provide an interface to query it?

@Stebalien
Copy link
Member

That makes sense. So the plugin itself would log the information, and provide an interface to query it?

Yes (or anything else it wants, really).

@obo20
Copy link
Author

obo20 commented Jan 8, 2019

That sounds like it would work well then! Let me know if you nee me to answer any further questions as the roadmap discussion progresses. Thanks for the great discussion / ideas.

@obo20
Copy link
Author

obo20 commented Jan 10, 2019

Apologies for effectively reviving this issue so soon after it was resolved, but this blog entry came out today: https://www.ctrl.blog/entry/ipfs-pin-storage-accounting

It brings up some really good points that are quite hard to address without some serious technical overhead. While the post speaks about the difficulties of keeping track of the deduplication in IPFS,
it again makes me think about the benefits of being able to segment off the IPFS data store into sub-stores.

For reference, here is what I said earlier in this thread.

For awhile I've thought about the feasibility of segmenting off the IPFS data store. By that I mean allow the data store to have sub directories for specific buckets of content. So when I pin content to an IPFS node, I can also provide a "sub-store" name with that content.

My initial reasoning for desiring something like this was so that as a node owner I could restrict access to certain sub-sections of my data store to specific users (this would also require access control to be figured out for IPFS), but I suppose the base concept of sub-stores could maybe be used to track bandwidth of data by which sub-store it came from. In the bandwidth use case, each "sub-store" of the data store would be a user, and I could simply query my node each month to find out what that user's sub-store used in bandwidth.

Such a sub store could also be utilized for monitoring how much overall data different users are storing on a node. It would work against deduplication between users, but that would be a tradeoff a node provider could make.

@Stebalien Do you have any thoughts on the feasibility of something like this? You mentioned sub-stores might actually have some performance benefits, and I'm beginning to see quite a few issues that could be tackled with such a solution.

@Stebalien
Copy link
Member

You mentioned sub-stores might actually have some performance benefits

I'm talking about a very different use-case. Effectively, tiered caching.

For everything else, I'd expect labeling to be more useful. That is, provide some way to annotate blocks with additional information. We've already found needs for this in reference counting, providing, access control, etc.

However, that's something that'll take quite a bit of planning. I've suggested the plugin interface because it allows people to experiment with their own solutions without waiting for go-ipfs to merge a general-purpose solution.

@obo20
Copy link
Author

obo20 commented Jan 10, 2019

For everything else, I'd expect labeling to be more useful. That is, provide some way to annotate blocks with additional information. We've already found needs for this in reference counting, providing, access control, etc.

Labeling is something I considered a bit as well, but figured it sounded a bit excessive so I didn't bring it up. Glad to hear that it has other use cases and is something being considered. This will likely solve a lot of core issues that I see coming up in the future.

However, that's something that'll take quite a bit of planning. I've suggested the plugin interface because it allows people to experiment with their own solutions without waiting for go-ipfs to merge a general-purpose solution.

No worries. I was expecting any general solutions to take some time, I just wanted to get the thought process started. Thanks again for all the information on this.

@parkan
Copy link
Contributor

parkan commented Jan 18, 2019

side suggestion: I wonder if these metrics may be more appropriate to collect at the ipfs-cluster (which I presume pinata is running) level?

I'd imagine that most users requiring these statistics would be large-scale cluster users, and the numbers of interest would be for the entire cluster, not per-daemon

this seems like a good opportunity to isolate the metrics code away from the core daemon, and possibly implement the counters in a way that's agnostic of the actual bitswap/provide implementation

/cc @hsanjuan @lanzafame

@obo20
Copy link
Author

obo20 commented Jan 18, 2019

@parkan I was actually interested in tracking all of this on a per node level. Unfortunately cluster wasn't providing the granularity we wanted for pin management.

To elaborate, cluster wouldn't allow us to choose which nodes to pin content on at a per-content level. This presents a problem when some of our users may want to pin content on 2 nodes, but others may want 3 nodes of replication. That also prevents us from pinning content specifically where it's being requested the most.

@hsanjuan
Copy link
Contributor

To elaborate, cluster wouldn't allow us to choose which nodes to pin content on at a per-content level. This presents a problem when some of our users may want to pin content on 2 nodes, but others may want 3 nodes of replication. That also prevents us from pinning content specifically where it's being requested the most.

hi @obo20 , cluster has almost always allowed to set replication level separately per pinned item. Also we're now adding the possibility of explicitly choosing on which peers things will be pinned (after a user requested it ipfs-cluster/ipfs-cluster#646).

We'd be super happy to know what you need most from IPFS Cluster so we can add it, just open an issue or send us a message to ipfs-cluster-wg@ipfs.io.

@obo20
Copy link
Author

obo20 commented Jan 19, 2019

Well I feel pretty silly right now. Thank you for that information @hsanjuan. Upon deeper diving into the go-api it does appear that ipfs-cluster supports the replication level per pinned item. Which makes sense, as I suppose ipfs-cluster would need to keep track of that info anyway.

Are the api options documented anywhere outside of the github for the ipfs-cluster go package ? This may have been why I initially had the misconception I did.

Also, are there any plans for an official js http client? I see this project: https://github.com/te0d/js-ipfs-cluster-api, but unfortunately it appears to have been abandoned. If not, I can certainly fork it and make updates as needed, but obviously officially supported packages are always preferred as I can guarantee they're up to date with the ipfs-cluster project itself.

@hsanjuan
Copy link
Contributor

Are the api options documented anywhere outside of the github for the ipfs-cluster go package ? This may have been why I initially had the misconception I did.

Well, perhaps the best is https://godoc.org/github.com/ipfs/ipfs-cluster/api/rest/client#Client (good point, we need to finish https://cluster.ipfs.io/documentation/developer/api/).

Also, are there any plans for an official js http client? I see this project: https://github.com/te0d/js-ipfs-cluster-api, but unfortunately it appears to have been abandoned. If not, I can certainly fork it and make updates as needed, but obviously officially supported packages are always preferred as I can guarantee they're up to date with the ipfs-cluster project itself.

We don't have plans nor head count for it. Contribution would be very very welcomed on this front (someone asked for something similar here https://discuss.ipfs.io/t/ipfs-cluster-js-implementation/4706/1)

@obo20 let's not hijack this issue though, let's have a thread in Discourse or an issue in the repo to answer all your questions.

@parkan
Copy link
Contributor

parkan commented Jan 21, 2019

thanks for jumping in @hsanjuan! please link any further discussion from here -- it sounds like cluster is in fact the right place to do this

@krishraghuram
Copy link

Hi, what happened to this thread?
Did the discussion continue in a discuss.ipfs.io thread...

I was also looking for granular logs and metrics.
Just to give a high level idea, I wanted to emit the logs and metrics from IPFS, send them to elasticsearch, and use kibana to make queries/dashboards.
This would allow me to do few things,

  1. granular metrics on which we can perform aggregations to know which nodes get lot of traffic, for what cids/blocks... etc.
  2. have logs for discovering and debugging issues in our system design.
  3. monitor the state of the system (this can also be done by ipfs stats and netdata ipfs plugin).

Currently, the best option for this looks like emitting the nginx (which fronts the ipfs gateway) access logs to elasticsearch/kibana.

PS: I'm not even sure if elasticsearch/kibana is the right choice for us yet. But using them here as a "placeholder" to explain what I'm trying to do.

@momack2
Copy link
Contributor

momack2 commented Jan 16, 2021

@gmasgras - can you describe the configuration/integration we use for metrics/monitoring via grafana/kibana on the IPFS clusters and gateway?

@github-actions
Copy link

github-actions bot commented Oct 9, 2023

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale label Oct 9, 2023
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Oct 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants