-
Notifications
You must be signed in to change notification settings - Fork 6
Open Problem: Enhanced Bitswap/GraphSync with more Network Smarts #9
Conversation
### What defines a complete solution? | ||
> What hard constraints should it obey? Are there additional soft constraints that a solution would ideally obey? | ||
|
||
First and foremost, any complete solution should account for extensibility as the IPFS system needs to scale up and more applications are implemented on top. The active number of users of IPFS is increasing exponentially and the requests submitted to the network are following accordingly. That said, a complete solution should account for those numbers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure unbounded exponential scaling is a realistic goal. Would be good to put some order of magnitude here, especially given the reference to "those numbers".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, need to clarify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add that ideally IPFS should dynamically adapt to different environments, analogously to how TCP works in a data center and also works on the broader internet
Co-Authored-By: Jorge Soares <mail@jorgesoares.org>
Co-Authored-By: Jorge Soares <mail@jorgesoares.org>
Co-Authored-By: Jorge Soares <mail@jorgesoares.org>
Co-Authored-By: Jorge Soares <mail@jorgesoares.org>
Co-Authored-By: Jorge Soares <mail@jorgesoares.org>
Co-Authored-By: Jorge Soares <mail@jorgesoares.org>
Co-Authored-By: Jorge Soares <mail@jorgesoares.org>
Co-Authored-By: Jorge Soares <mail@jorgesoares.org>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM 👍
I left a couple of comments with some more detail in case you need to incorporate that background info anywhere
### What defines a complete solution? | ||
> What hard constraints should it obey? Are there additional soft constraints that a solution would ideally obey? | ||
|
||
First and foremost, any complete solution should account for extensibility as the IPFS system needs to scale up and more applications are implemented on top. The active number of users of IPFS is increasing exponentially and the requests submitted to the network are following accordingly. That said, a complete solution should account for those numbers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add that ideally IPFS should dynamically adapt to different environments, analogously to how TCP works in a data center and also works on the broader internet
@yiannisbot can you take in @dirkmc's review before I do the final review for the merge? Thank you! |
Yup, it's on the to-do list for this week as I prepare the RFPs. |
Thanks a lot @dirkmc! Very useful feedback. Most of it now integrated in the main text. |
I'm not sure if we want to include it in this document, but I just want to make sure people are aware that the folks at qri.io have implemented a data transfer mechanism using some IPFS components that keeps track of blocks in a DAG using Manifest files, analagous to bittorrent magnet files. |
Added in the "Extra Notes" |
|
||
If none of the directly connected peers have any of the WANT list blocks, bitswap falls back to the DHT to find the requested content. This results in long delays to get to a peer that stores the requested content. | ||
|
||
Once the recipient node starts receiving content from multiple peer nodes, it prunes down the long-latency peers and keeps the one to which the RTT is the shortest. Current proposals within the IPFS ecosystem are considering keeping the node with the highest throughput instead. It is not clear at this point which is the best approach. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not exactly.
- We currently prune to peers that have the content, then prioritize sending wants to peers with lower latencies. We still send wants to all peers (IIRC).
- The plan is to change that second part to: prioritize sending wants to peers with the least amount of queued work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yiannisbot did you take @Stebalien's review in this comment?
|
||
- *DAG Block Interconnection.* Although bitswap does not/cannot recognise any relationship between different blocks of the same DAG, a requesting node can ask a node that provided a previous block for subsequent blocks of the same DAG. This approach intuitively assumes that a node that has a block of a DAG is very likely to have others. This is often referred to as “session” between the peers that have provided some part of the DAG. | ||
|
||
- *Latency vs Throughput.* Bitswap is currently sorting peers by latency, i.e., it is pruning down the connections that incur higher latency. It has been suggested that this is changed to maximise throughput (i.e., keep the pipe full). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not really either/or. Really, we should:
- Optimize for latency when traversing deep/narrow DAGs (e.g., a blockchain/path). Lower latency means we learn about the next node faster.
- Optimize for throughput when traversing a wide DAG in parallel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's great! I was not aware this was the intention.
|
||
There have been significant research efforts lately in the area of coded caching. The main concept has been proposed in 1960s in the form of error correction and targeted the area of content delivery over wireless, lossy channels. It has been known as Reed-Solomon error correction. Lately, with seminal works such as “Fundamental Limits of Caching”, Niesen et. al. have proposed the use of coding to improve caching performance. In a summary, the technique works as follows: if we have a file that consists of 10 chunks and we store all 10 chunks in the same or different memories/nodes, then we need to retrieve those exact 10 chunks in order to reconstruct the file. | ||
|
||
In contrast, according to the coded caching theory, before storing the 10 chunks we encode the file using erasure codes. This results in some number of chunks x>10, say 13, for the sake of illustration. This clearly results in more data produced after adding codes to the original data. However, when attempting to retrieve the original file, a user needs to collect *any 10 of those 13 chunks*. By doing so, the user will be able to reconstruct the original file, without needing to get all 13 chunks. Although such approach does not save bandwidth (we still need to reconstruct 10 chunks of equal size to the original one), it makes the network more resilient to nodes being unavailable. In other words, in order to reconstruct the original file without coding, all 10 of the original peers that store a file have to be online and ready to deliver the chunks, whereas in the coded caching case, any 10 out of the 13 peers need to be available and ready to provide the chunks. Blind replication of the original chunks will not provide the same benefit, as the number of peers will need to be much higher (at least 20 as compared to 13) in order to operate with the same satisfaction ratio. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, if I erasure encode my file, I can't reconstruct a part of the file without having the minimum number of chunks. Is this correct?
If so, I'm not sure if this buys us anything. Peers will likely either have or not have all the chunks necessary to reconstruct a file; having one chunk will be highly correlated with having the rest.
TL;DR: chunk deletion is not random. Yes, disks can fail but that should be handled at a lower layer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, if I erasure encode my file, I can't reconstruct a part of the file without having the minimum number of chunks. Is this correct?
Yes, it is.
If so, I'm not sure if this buys us anything. Peers will likely either have or not have all the chunks necessary to reconstruct a file; having one chunk will be highly correlated with having the rest.
That's correct for the case of small files. But in case of very large files, coded caching provides nice load-balancing properties, i.e., you don't keep someone's uplink saturated for hours to get some GBs worth of data. The replication you need to do in order to achieve equal load-balancing but without caching will be much higher, therefore, resulting in inefficient use of (storage) resources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've addressed @Stebalien's comments and committed a new version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some additional comments. This is looking really solid, We can merge it once the last comments are addressed
|
||
### Extra notes | ||
|
||
[qri.io](https://qri.io/): a data transfer mechanism using IPFS components to keep track of blocks in a DAG using Manifest files (similar to bittorrent magnet files) - https://github.com/qri-io/dag |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should go as "one of the experiments within the IPFS Ecosystem", it is a tool that uses IPFS and its APIs for faster syncs.
No description provided.