-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduction to hybrid net #113
Conversation
bc79935
to
d572f0b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 files reviewed, 20 unresolved discussions (waiting on @nieznanysprawiciel)
arch-snapshot/arch.md
line 566 at r1 (raw file):
Hybrid Net was developed as an intermediate step towards decentralization, enabling peer-to-peer (P2P) communication between Golem Nodes. However, since most of the network operates behind NATs, P2P cannot be the sole communication
between some of the Golem Nodes
Code quote:
between Golem Nodes
arch-snapshot/arch.md
line 567 at r1 (raw file):
Hybrid Net was developed as an intermediate step towards decentralization, enabling peer-to-peer (P2P) communication between Golem Nodes. However, since most of the network operates behind NATs, P2P cannot be the sole communication method. To address this, the Net implementation supports communication forwarding through specialized Node known
Is this really a Node? As in Yagna Node?
Code quote:
Node
arch-snapshot/arch.md
line 568 at r1 (raw file):
between Golem Nodes. However, since most of the network operates behind NATs, P2P cannot be the sole communication method. To address this, the Net implementation supports communication forwarding through specialized Node known as relay.
stray whitespace
Code quote:
··
arch-snapshot/arch.md
line 579 at r1 (raw file):
register with the Relay server, providing the necessary information for discovery and connection. The Relay server: - Maintains a list of Nodes present in the network. - Stores each Node's identity, public key, and IP address.
Could you elaborate more on the public key? Is it a part of the identity? Or is it ephemeral for network communication only?
Code quote:
public key
arch-snapshot/arch.md
line 584 at r1 (raw file):
- Checks if connecting Nodes have public IP port exposed - Offers functions for: - Discovering Nodes and querying their information.
What is there to discover? All nodes are connected to the relay, no?
Code quote:
Discovering
arch-snapshot/arch.md
line 589 at r1 (raw file):
Communication with the Relay server is handled through a custom protocol built on top of UDP, defined using Protocol Buffers (protobuf). UDP was chosen for its lightweight nature, as it does not require maintaining open connections, which would consume more system resources compared to TCP. This makes it possible to handle a large number of Nodes
I love dumb questions, so I can't resist this one: isn't this a premature optimization?
For every Node on the network you need to keep its identity, public keys, IP address and probably some overheads. Is this really that much smaller than the kernel memory footprint for a TCP connection? Besides, even if we assume that an idle TCP connection eats up 1K of kernel memory, you can have 1e6 machines connected per every GB of memory. I'd dare to say that 1M machines is way more than a single relay can handle anyway.
Code quote:
Buffers (protobuf). UDP was chosen for its lightweight nature, as it does not require maintaining open connections,
which would consume more system resources compared to TCP. This makes it possible to handle a large number of Nodes
arch-snapshot/arch.md
line 590 at r1 (raw file):
Buffers (protobuf). UDP was chosen for its lightweight nature, as it does not require maintaining open connections, which would consume more system resources compared to TCP. This makes it possible to handle a large number of Nodes concurrently, ensuring decent scalability.
Does this apply to relaying communication between nodes behind NATs as well? Is yes, then how?
Code quote:
Communication with the Relay server is handled through a custom protocol built on top of UDP, defined using Protocol
Buffers (protobuf). UDP was chosen for its lightweight nature, as it does not require maintaining open connections,
which would consume more system resources compared to TCP. This makes it possible to handle a large number of Nodes
concurrently, ensuring decent scalability.
arch-snapshot/arch.md
line 608 at r1 (raw file):
which must compute a hash with a specified number of leading zeros. The difficulty level, set by the Relay server, determines the number of leading zeros required. This computationally expensive task protects the Relay server from DDoS attacks by forcing the Node to complete a certain amount of work before establishing a Session.
Stray whitespace
Code quote:
··
arch-snapshot/arch.md
line 610 at r1 (raw file):
DDoS attacks by forcing the Node to complete a certain amount of work before establishing a Session. The challenge is cryptographically signed using the private key of each identity associated with the Node. The Relay
response, right?
Code quote:
challenge
arch-snapshot/arch.md
line 611 at r1 (raw file):
The challenge is cryptographically signed using the private key of each identity associated with the Node. The Relay server can then recover the Node's identities and public keys from these signatures, verifying that the Node
extract?
Code quote:
recover
arch-snapshot/arch.md
line 612 at r1 (raw file):
The challenge is cryptographically signed using the private key of each identity associated with the Node. The Relay server can then recover the Node's identities and public keys from these signatures, verifying that the Node possesses the corresponding private keys. The recovered public keys are later made available to other Nodes that
extracted?
Code quote:
recovered
arch-snapshot/arch.md
line 619 at r1 (raw file):
When a Node initiates a Session, an additional mechanism is required to determine whether the IP address from which the packets are received is behind a NAT. This is achieved by sending Ping packets (network protocol ping, not to be confused with the Linux command) from a different UDP port than the one used for receiving incoming Sessions. This
ICMP packet
Code quote:
Linux command
arch-snapshot/arch.md
line 622 at r1 (raw file):
ensures that any network devices between the Node and the Relay server won't recognize these Ping packets as part of the same communication stream. If the ports are not publicly exposed, the Ping packets will be dropped, confirming that the Node is behind a NAT.
Is there a designated port for that? You could get unlucky and accidentally reach a port of a different Yagna behind the same NAT. Imagine a server room with 100 servers, where each server has a running Yagna. If all of them are behind the same NAT, the likelihood of this event happening is roughly 1-(1-100/65536)**100 which is 15%.
Code quote:
the same communication stream. If the ports are not publicly exposed, the Ping packets will be dropped, confirming
that the Node is behind a NAT.
arch-snapshot/arch.md
line 636 at r1 (raw file):
RelayServer->>RelayServer: Verify solution RelayServer->>RelayServer: Recover identities from signatures RelayServer->>GolemNode: Session established response
I must be missing something. What prevents me from grabbing your NodeId and presenting myself to the relay with a public key for my own private key rather than yours?
Code quote:
GolemNode->>RelayServer: Session request
RelayServer->>GolemNode: Challenge
GolemNode->>GolemNode: Solve challenge
GolemNode->>GolemNode: Sign solution with Node's identities
GolemNode->>RelayServer: Challenge response
RelayServer->>RelayServer: Verify solution
RelayServer->>RelayServer: Recover identities from signatures
RelayServer->>GolemNode: Session established response
arch-snapshot/arch.md
line 667 at r1 (raw file):
##### Establishing connections between Nodes Currently, a peer-to-peer (p2p) session can be established in two scenarios. In the first, if the target Node has a
I think I need to learn more about the Session. Can I create multiple Sessions between the same two Nodes? What makes those Sessions different from what you'd typically call a "connection"?
Code quote:
session
arch-snapshot/arch.md
line 672 at r1 (raw file):
Node first sends a Reverse Connection message to the Relay server, which forwards it to the target Node. The target Node then attempts to establish a direct connection with the initiating Node. Whether the target Node has a public IP can be determined based on information returned by the Relay server.
trailing whitespace
Code quote:
····
arch-snapshot/arch.md
line 674 at r1 (raw file):
IP can be determined based on information returned by the Relay server. Since the current Net implementation does not support NAT punching, if both Nodes are behind NAT, communication must
I think it's called either "hole punching", or "NAT traversal", or "NAT punch through" but not "NAT punching".
Code quote:
NAT punching
arch-snapshot/arch.md
line 777 at r1 (raw file):
Different channels can be utilized to send messages, as explained in the [reliable, unreliable, and transfers channels chapter](#reliable-unreliable-and-transfers-channels). In the Hybrid Net, reliable and transfer channels are distinguished by using separate TCP connections. This separation ensures that independent sender buffers are
Still over the same Session, right?
Code quote:
TCP connections
arch-snapshot/arch.md
line 777 at r1 (raw file):
Different channels can be utilized to send messages, as explained in the [reliable, unreliable, and transfers channels chapter](#reliable-unreliable-and-transfers-channels). In the Hybrid Net, reliable and transfer channels are distinguished by using separate TCP connections. This separation ensures that independent sender buffers are
Could you please rephrase this sentence so that it is 100% clear that:
- every reliable channel gets a new connection
- every data transfer gets a new connection
- everything else is squeezed into one connection (realiable?)
Code quote:
channels chapter](#reliable-unreliable-and-transfers-channels). In the Hybrid Net, reliable and transfer channels
are distinguished by using separate TCP connections. This separation ensures that independent sender buffers are
arch-snapshot/arch.md
line 803 at r1 (raw file):
Optimal guarantees can be achieved when two neighboring Nodes have distinctly different neighborhoods, minimizing their number of common neighbors. Currently, in the Hybrid Net, neighborhood is determined based on the reversed Hamming distance between Node IDs.
As mentioned in some other review - I need more context here. I do understand that neighborhood is a concept from some DHT designs. If so, please mention it and introduce the reader to the concept.
Code quote:
Hybrid Net implements local broadcasting to the nearest neighborhood of each Node. To query its neighbors, a Node
can send a `Neighborhood` request to the Relay server. The Relay server then responds with a list of Nodes that are
closest to the querying Node, based on a predefined metric.
After receiving the list of neighbors, the Node attempts to establish connections with them, as described in the
chapter on [communication](#establishing-connections-between-nodes). The neighborhood algorithm does not
differentiate between Nodes capable of establishing peer-to-peer connections and those that require relayed
communication. Unlike IP-level broadcasts, Hybrid Net uses reliable channels for message transmission.
**Neighborhood - distance function**
To prevent clustering of Nodes and accidental splits in the network, where subsets of Nodes become unreachable, a proper
neighborhood function must be utilized. This function is defined by the network module, and the market relies on the
broadcast function, leaving it with no alternative in this regard.
Optimal guarantees can be achieved when two neighboring Nodes have distinctly different neighborhoods, minimizing their
number of common neighbors. Currently, in the Hybrid Net, neighborhood is determined based on the reversed Hamming
distance between Node IDs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 files reviewed, 21 unresolved discussions (waiting on @nieznanysprawiciel)
arch-snapshot/arch.md
line 563 at r1 (raw file):
specifically for transfers. #### Hybrid Net
Moving a comment from #111
I'm sorry, but I genuinely do not understand any of this. In particular, I don't understand:
- what a neighbor is
- what a neighborhood is
- what a broadcast function is
- what a neighborhood function is
- how that function affect the likelihood of a network partition
Code quote:
#### Hybrid Net
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 files reviewed, 21 unresolved discussions (waiting on @dopiera)
arch-snapshot/arch.md
line 567 at r1 (raw file):
Previously, dopiera (Marek Dopiera) wrote…
Is this really a Node? As in Yagna Node?
I probably so much wanted relay to be real Node :(
arch-snapshot/arch.md
line 568 at r1 (raw file):
Previously, dopiera (Marek Dopiera) wrote…
stray whitespace
Done.
arch-snapshot/arch.md
line 584 at r1 (raw file):
Previously, dopiera (Marek Dopiera) wrote…
What is there to discover? All nodes are connected to the relay, no?
This is what I would call discovery :D finding out information about Node
Otherwise we don't have discovery at all
I changed to only mention query node information, to make it less controversial
arch-snapshot/arch.md
line 608 at r1 (raw file):
Previously, dopiera (Marek Dopiera) wrote…
Stray whitespace
Done.
arch-snapshot/arch.md
line 610 at r1 (raw file):
Previously, dopiera (Marek Dopiera) wrote…
response, right?
Done.
arch-snapshot/arch.md
line 611 at r1 (raw file):
Previously, dopiera (Marek Dopiera) wrote…
extract?
I checked that not only in our code but in general people use term recover in case of ECDSA.
arch-snapshot/arch.md
line 612 at r1 (raw file):
Previously, dopiera (Marek Dopiera) wrote…
extracted?
People use term recover
arch-snapshot/arch.md
line 619 at r1 (raw file):
Previously, dopiera (Marek Dopiera) wrote…
ICMP packet
Done.
arch-snapshot/arch.md
line 672 at r1 (raw file):
Previously, dopiera (Marek Dopiera) wrote…
trailing whitespace
Done.
arch-snapshot/arch.md
line 674 at r1 (raw file):
Previously, dopiera (Marek Dopiera) wrote…
I think it's called either "hole punching", or "NAT traversal", or "NAT punch through" but not "NAT punching".
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 files reviewed, 21 unresolved discussions (waiting on @dopiera)
arch-snapshot/arch.md
line 622 at r1 (raw file):
Previously, dopiera (Marek Dopiera) wrote…
Is there a designated port for that? You could get unlucky and accidentally reach a port of a different Yagna behind the same NAT. Imagine a server room with 100 servers, where each server has a running Yagna. If all of them are behind the same NAT, the likelihood of this event happening is roughly 1-(1-100/65536)**100 which is 15%.
It's relay server that has separate port. Yagna node has only one UDP port for listening to external connections
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 files reviewed, 21 unresolved discussions (waiting on @dopiera)
arch-snapshot/arch.md
line 636 at r1 (raw file):
Previously, dopiera (Marek Dopiera) wrote…
I must be missing something. What prevents me from grabbing your NodeId and presenting myself to the relay with a public key for my own private key rather than yours?
It's good point. It wasn't mentioned anywhere that NodeId is derived from public key, so you can't do that.
But I think it should be rather described in Identity chapter. I added to our task on board that it should be described
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 files reviewed, 21 unresolved discussions (waiting on @dopiera)
arch-snapshot/arch.md
line 589 at r1 (raw file):
Previously, dopiera (Marek Dopiera) wrote…
I love dumb questions, so I can't resist this one: isn't this a premature optimization?
For every Node on the network you need to keep its identity, public keys, IP address and probably some overheads. Is this really that much smaller than the kernel memory footprint for a TCP connection? Besides, even if we assume that an idle TCP connection eats up 1K of kernel memory, you can have 1e6 machines connected per every GB of memory. I'd dare to say that 1M machines is way more than a single relay can handle anyway.
It's more than that, because there are tcp buffers. The middle values is default on my ubuntu at least.
It's actually less than I expected. Out embedded stack has much bigger buffers for optimal transfers.
But there are other reasons as well. For example we need to forward traffic on UDP anyway. So we would need separate UDP and TCP ports exposed. This would require changes to p2p sessions as well.
Maybe it could be said more in this chapter..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 files reviewed, 9 unresolved discussions (waiting on @nieznanysprawiciel)
arch-snapshot/arch.md
line 568 at r1 (raw file):
Previously, nieznanysprawiciel wrote…
Done.
I still see it
arch-snapshot/arch.md
line 589 at r1 (raw file):
Previously, nieznanysprawiciel wrote…
It's more than that, because there are tcp buffers. The middle values is default on my ubuntu at least.
It's actually less than I expected. Out embedded stack has much bigger buffers for optimal transfers.But there are other reasons as well. For example we need to forward traffic on UDP anyway. So we would need separate UDP and TCP ports exposed. This would require changes to p2p sessions as well.
Maybe it could be said more in this chapter..
Makes sense - especially the fact that we need UDP anyway.
arch-snapshot/arch.md
line 611 at r1 (raw file):
Previously, nieznanysprawiciel wrote…
I checked that not only in our code but in general people use term recover in case of ECDSA.
ACK
arch-snapshot/arch.md
line 612 at r1 (raw file):
Previously, nieznanysprawiciel wrote…
People use term recover
ACK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 files reviewed, 10 unresolved discussions (waiting on @nieznanysprawiciel)
arch-snapshot/arch.md
line 802 at r3 (raw file):
significant application is in [the Offer Propagation algorithm](#offer-propagation). While the implementation of specific algorithms is handled by other modules, the network module provides the necessary operations as building blocks for these processes.
This is great, thank you for adding it.
I've got a dumb follow-up question, though. Before I ask the question, please verify my assumptions, which are:
- we're propagating only offer IDs and the offer themselves are then fetched by every provider directly from all providers
- eventually, every offer is known to every Node on the network
If this is correct and given the fact that the network is already centralized, why bother with the neighborhoods? Couldn't we do this instead?
- every Node publishes all its Offer IDs to the relay (or even the whole Offers)
- the relay aggregates all offers
- the relay periodically updates all Nodes with all the Offer ids (or even whole Offers)
I presume there is a logical answer - I'd be grateful if you added it to the document.
Code quote:
**What is a Neighborhood?**
Before diving into broadcasting, it's essential to understand the concept of a neighborhood. In a network, a
neighborhood is a subset of Nodes that are considered closest to a given Node based on an abstract metric. Each Node
has its own neighborhood. This metric doesn't necessarily reflect the real-world proximity of Nodes. For instance,
two Nodes on opposite sides of the globe could be neighbors, while two Nodes within the same physical network may be
too distant in terms of this metric to be in the same neighborhood.
In peer-to-peer networks, the concept of Node distance is often used for efficient Node discovery. However, since
the current Golem network layer relies on the Relay server for Node discovery, neighborhoods don't serve that
purpose. Although it’s conceivable that a fallback, such as a Kademlia-like implementation, could be used in the
event of a Relay server downtime, for now, the primary purpose of the neighborhood concept is broadcasting.
Broadcasting is a mechanism used by various algorithms to propagate information throughout the network. Its most
significant application is in [the Offer Propagation algorithm](#offer-propagation). While the implementation of
specific algorithms is handled by other modules, the network module provides the necessary operations as building
blocks for these processes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 files reviewed, 10 unresolved discussions (waiting on @dopiera)
arch-snapshot/arch.md
line 563 at r1 (raw file):
Previously, dopiera (Marek Dopiera) wrote…
Moving a comment from #111
I'm sorry, but I genuinely do not understand any of this. In particular, I don't understand:
- what a neighbor is
- what a neighborhood is
- what a broadcast function is
- what a neighborhood function is
- how that function affect the likelihood of a network partition
Does new chapter explaining neighborhood resolves this comment? Or is there something left to explain?
arch-snapshot/arch.md
line 566 at r1 (raw file):
Previously, dopiera (Marek Dopiera) wrote…
between some of the Golem Nodes
Done.
arch-snapshot/arch.md
line 568 at r1 (raw file):
Previously, dopiera (Marek Dopiera) wrote…
I still see it
Hmm, done again
arch-snapshot/arch.md
line 803 at r1 (raw file):
Previously, dopiera (Marek Dopiera) wrote…
As mentioned in some other review - I need more context here. I do understand that neighborhood is a concept from some DHT designs. If so, please mention it and introduce the reader to the concept.
Does new chapter explaining neighborhood resolves this comment?
arch-snapshot/arch.md
line 802 at r3 (raw file):
we're propagating only offer IDs and the offer themselves are then fetched by every provider directly from all providers
Full Offers are fetched from broadcast senders, not original authors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 files reviewed, 9 unresolved discussions (waiting on @nieznanysprawiciel)
arch-snapshot/arch.md
line 563 at r1 (raw file):
Previously, nieznanysprawiciel wrote…
Does new chapter explaining neighborhood resolves this comment? Or is there something left to explain?
It does, thank you.
arch-snapshot/arch.md
line 802 at r3 (raw file):
ACK.
Regarding the essence of this comment and following up on our F2F discussion, how about adding this somewhere:
Although HybridNet remains centralized for node discovery and relaying communication between Nodes behind NATs, broadcasting has already been designed in a decentralized manner as a step toward full network decentralization.
arch-snapshot/arch.md
line 784 at r4 (raw file):
Let's start this section by stating the problem which this part of the Network abstraction is supposed to solve.
How about something along these lines:
Every Requestor needs to have a complete view of the market. Each Requestor contacting every Provider on the network to learn of their Offers would be a drastic scalability limitation. Hence we need smarter ways of disseminating information. These ways are abstracted away by the Network module in the form of the "Broadcast" method. HybridNet's implementation is inspired by XYZ algorithm, which introduces the concept of neghborhoods.
Code quote:
##### Broadcasting and neighborhood
arch-snapshot/arch.md
line 799 at r4 (raw file):
event of a Relay server downtime, for now, the primary purpose of the neighborhood concept is broadcasting. Broadcasting is a mechanism used by various algorithms to propagate information throughout the network. Its most
As per my comment below - I don't think that's true in the generally accepted sense. "Broadcasting" is not really a mechanism - it's a name for the act of sending something to everybody.
Code quote:
Broadcasting is a mechanism used by various algorithms to propagate information throughout the network.
arch-snapshot/arch.md
line 804 at r4 (raw file):
blocks for these processes. **Broadcasting**
I think I'm finally starting to understand where my confusion stems from. I think these are (correct me if I'm wrong) a rather unorthodox use of the word "broadcast" and a leaky abstraction of the HybridNet. It is my understanding that what you mean by "broadcast" is a way to send a similar message to all neighbors.
I consider the naming unorthodox is because usually broadcast means sending something to all participants of a network, not a subset (that's multicast).
I think that HybridNet's abstraction is leaky because the Markeplace needs to know of the network topology (i.e. what you call broadcast requires specifying some subset of nodes if I understood you correctly).
BTW, the fact that "propagation" has two meanings depending on the context doesn't help either:
- the algorithm which Nodes obey (i.e. talking to neighbors and aggregating offers) to eventually disperse information from every Node to every Node
- sending Offers from one Node to another
Here is a suggestion on how we can give words meanings to make it easier for readers:
- broadcast - a process of sending a bit of information to all the nodes on the network; in a naive implementation, it may be every node speaking to every node or there might be a central aggregator; in a P2P implementation it the information might be routed through neighbors
- propagation - the process built on top of Net, which makes sure that every Node sees every Offer
- transmit - the the act of sending offers or IDs to particular Nodes
- transmit to neighbors - the the act of sending offers or IDs to neighboring Nodes
Code quote:
**Broadcasting**
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 files reviewed, 9 unresolved discussions (waiting on @dopiera)
arch-snapshot/arch.md
line 590 at r1 (raw file):
Previously, dopiera (Marek Dopiera) wrote…
Does this apply to relaying communication between nodes behind NATs as well? Is yes, then how?
There is Forwarding Network Traffic
chapter describing this. Is there something lacking you would like there mentioned?
arch-snapshot/arch.md
line 667 at r1 (raw file):
I define Session in chapter Connecting to Relay server
. Later in chapter Peer-to-peer Session handshake
differences between Session with Relay vs. p2p Session are described.
I changed the descriptions in all places to consequently use word Session
instead of connection. In earlier chapters when Session was not defined yet, I replaced connection with more general word communication.
What makes those Sessions different from what you'd typically call a "connection"?
I think we used this word rather interchangeably. Session has more information in comparison to TCP connection for example. We have collection of all identities belonging on Node and some other information.
I checked definitions, it might be another non-standard usage of the word, but I would prefer to keep it, since we already use it this way.
arch-snapshot/arch.md
line 802 at r3 (raw file):
Previously, dopiera (Marek Dopiera) wrote…
ACK.
Regarding the essence of this comment and following up on our F2F discussion, how about adding this somewhere:
Although HybridNet remains centralized for node discovery and relaying communication between Nodes behind NATs, broadcasting has already been designed in a decentralized manner as a step toward full network decentralization.
Good idea. I'm adding this in the introduction to Hybrid Net chapter
arch-snapshot/arch.md
line 784 at r4 (raw file):
Previously, dopiera (Marek Dopiera) wrote…
Let's start this section by stating the problem which this part of the Network abstraction is supposed to solve.
How about something along these lines:
Every Requestor needs to have a complete view of the market. Each Requestor contacting every Provider on the network to learn of their Offers would be a drastic scalability limitation. Hence we need smarter ways of disseminating information. These ways are abstracted away by the Network module in the form of the "Broadcast" method. HybridNet's implementation is inspired by XYZ algorithm, which introduces the concept of neghborhoods.
But this mixes problems from different layers. I wouldn't like to make chapters about net market-centric. Do you have any proposal that would be more neutral?
arch-snapshot/arch.md
line 799 at r4 (raw file):
Previously, dopiera (Marek Dopiera) wrote…
As per my comment below - I don't think that's true in the generally accepted sense. "Broadcasting" is not really a mechanism - it's a name for the act of sending something to everybody.
Changed to broadcasting operation
arch-snapshot/arch.md
line 804 at r4 (raw file):
Previously, dopiera (Marek Dopiera) wrote…
I think I'm finally starting to understand where my confusion stems from. I think these are (correct me if I'm wrong) a rather unorthodox use of the word "broadcast" and a leaky abstraction of the HybridNet. It is my understanding that what you mean by "broadcast" is a way to send a similar message to all neighbors.
I consider the naming unorthodox is because usually broadcast means sending something to all participants of a network, not a subset (that's multicast).
I think that HybridNet's abstraction is leaky because the Markeplace needs to know of the network topology (i.e. what you call broadcast requires specifying some subset of nodes if I understood you correctly).
BTW, the fact that "propagation" has two meanings depending on the context doesn't help either:
- the algorithm which Nodes obey (i.e. talking to neighbors and aggregating offers) to eventually disperse information from every Node to every Node
- sending Offers from one Node to another
Here is a suggestion on how we can give words meanings to make it easier for readers:
- broadcast - a process of sending a bit of information to all the nodes on the network; in a naive implementation, it may be every node speaking to every node or there might be a central aggregator; in a P2P implementation it the information might be routed through neighbors
- propagation - the process built on top of Net, which makes sure that every Node sees every Offer
- transmit - the the act of sending offers or IDs to particular Nodes
- transmit to neighbors - the the act of sending offers or IDs to neighboring Nodes
I applied terminology fixes discussed on standup. Not a lot of them, but probably there will be more significant changes in other PRs for example about Offers propagation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 files reviewed, 9 unresolved discussions (waiting on @dopiera)
arch-snapshot/arch.md
line 579 at r1 (raw file):
Previously, dopiera (Marek Dopiera) wrote…
Could you elaborate more on the public key? Is it a part of the identity? Or is it ephemeral for network communication only?
I added chapter about Node identification to explain more.
And another chapter about encryption is on the way which is going to explain relationships between different keys
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 files reviewed, 9 unresolved discussions (waiting on @dopiera)
arch-snapshot/arch.md
line 777 at r1 (raw file):
Previously, dopiera (Marek Dopiera) wrote…
Still over the same Session, right?
yes
arch-snapshot/arch.md
line 777 at r1 (raw file):
Previously, dopiera (Marek Dopiera) wrote…
Could you please rephrase this sentence so that it is 100% clear that:
- every reliable channel gets a new connection
- every data transfer gets a new connection
- everything else is squeezed into one connection (realiable?)
But it doesn't work like you said. We have only single channel of each type.
It would be nice to have it implemented with ability to create channel, but it was ad hoc implementation to solve a problem not a fully designed concept
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 files reviewed, 8 unresolved discussions (waiting on @nieznanysprawiciel)
arch-snapshot/arch.md
line 777 at r1 (raw file):
Previously, nieznanysprawiciel wrote…
But it doesn't work like you said. We have only single channel of each type.
It would be nice to have it implemented with ability to create channel, but it was ad hoc implementation to solve a problem not a fully designed concept
I get it that channels in general are described in #112, I still have a limited understanding of their lifecycles. In particular:
- is there exactly one channel per triplet (NodeId pair, type of channel)?
- if I understand your description correctly, there is 1:1 relationship between channels and TCP connections, except for unreliable channels; is that right?
- is there exactly a single Session underpinning all channels with the same NodeId pair?
While the answer to the first question belongs to #112, I believe that answers to questions 2. and 3. should be answered here. Especially that I got it wrong as you say.
arch-snapshot/arch.md
line 784 at r4 (raw file):
My intention is as follows:
- Provide rationale for why need a Broadcast operation in the first place - granted, this should end up in Networking #112.
- Emphasize that a naive approach would not work, hence we need something smarter, which is why we're introducing the concept of neighbors.
Given that 1. should end up in #112 perhaps we should leave only something along these lines:
Broadcasting information via a naive way, i.e. every Node contacting all other Nodes would be a drastic scalability limitation. Hence we need smarter ways of disseminating information. HybridNet's implementation is inspired by XYZ algorithm, which introduces the concept of neighborhoods.
arch-snapshot/arch.md
line 799 at r4 (raw file):
I think I do understand what you mean but only because we talked about it before. How about rephrasing it slightly:
A technique used in various broadcasting algorithms (please link at least one) involves disseminating information through the network by having each node send broadcast messages only to its neighbors, relying on those neighbors to forward the information further. HybridNet uses it as well.
arch-snapshot/arch.md
line 804 at r4 (raw file):
Previously, nieznanysprawiciel wrote…
I applied terminology fixes discussed on standup. Not a lot of them, but probably there will be more significant changes in other PRs for example about Offers propagation
I still see some places, but let me resolve this comment and create comments in relevant places.
arch-snapshot/arch.md
line 818 at r7 (raw file):
Its most significant application is in [the Offer Propagation algorithm](#offer-propagation). While the implementation of specific algorithms is handled by other modules, the network module provides the necessary operations as building blocks for these processes.
Doesn't this apply to broadcasting in general rather than the technique, which HybridNet uses under the hood? IMO such a sentence should be in #112 to explain to the reader what the broadcast operation is for.
Code quote:
Its most significant application is in [the Offer Propagation algorithm](#offer-propagation). While the implementation of
specific algorithms is handled by other modules, the network module provides the necessary operations as building
blocks for these processes.
arch-snapshot/arch.md
line 822 at r7 (raw file):
That's not consistent with how we intended to use the word "broadcast", right? How about:
HybridNet implements the broadcast operation via sending broadcast messages to the nearest neighborhood...
Code quote:
local broadcast operation that sends message to the nearest neighborhood of the Node.
arch-snapshot/arch.md
line 822 at r7 (raw file):
**Broadcasting** Hybrid Net implements local broadcast operation that sends message to the nearest neighborhood of the Node. To query
nit: stray space
Code quote:
·
arch-snapshot/arch.md
line 824 at r7 (raw file):
Hybrid Net implements local broadcast operation that sends message to the nearest neighborhood of the Node. To query its neighbors, a Node can send a `Neighborhood` request to the Relay server. The Relay server then responds with a list of Nodes that are closest to the querying Node, based on a predefined metric.
nit: stray space
Code quote:
·
arch-snapshot/arch.md
line 829 at r7 (raw file):
chapter on [communication](#establishing-connections-between-nodes). The neighborhood algorithm does not differentiate between Nodes capable of establishing peer-to-peer Sessions and those that require relayed communication. Unlike IP-level broadcasts, Hybrid Net uses reliable channels for message transmission.
nit: trailing space - actually there are many more - can you please fix them all?
Code quote:
·
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 files reviewed, 5 unresolved discussions (waiting on @dopiera)
arch-snapshot/arch.md
line 799 at r4 (raw file):
Previously, dopiera (Marek Dopiera) wrote…
I think I do understand what you mean but only because we talked about it before. How about rephrasing it slightly:
A technique used in various broadcasting algorithms (please link at least one) involves disseminating information through the network by having each node send broadcast messages only to its neighbors, relying on those neighbors to forward the information further. HybridNet uses it as well.
I linked to libp2p. I had hard time finding good description. I'm afraid that this will rather complicate things...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 files reviewed, 5 unresolved discussions (waiting on @dopiera)
arch-snapshot/arch.md
line 777 at r1 (raw file):
Previously, dopiera (Marek Dopiera) wrote…
I get it that channels in general are described in #112, I still have a limited understanding of their lifecycles. In particular:
- is there exactly one channel per triplet (NodeId pair, type of channel)?
- if I understand your description correctly, there is 1:1 relationship between channels and TCP connections, except for unreliable channels; is that right?
- is there exactly a single Session underpinning all channels with the same NodeId pair?
While the answer to the first question belongs to #112, I believe that answers to questions 2. and 3. should be answered here. Especially that I got it wrong as you say.
I clarified point 3.
Hmm there is another implementation detail that I missed when it comes to question 2. There are actually 2 tcp connections per transport type (channel). One is used to send messages, other is used to receive messages from other party.
This is implementation detail, done for simplicity. Otherwise there are problems with synchronization. If 2 Nodes try to establish tcp connection at the same time, we end up with 2 connections anyway. We could avoid this, by having rules, which connection should be closed and which one should remain. This would be more error prone than current version.
In general there is much to lose and not much to gain.
I'm wondering, should I describe it?
I wouldn't like to dive so deep into implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 files reviewed, 5 unresolved discussions (waiting on @dopiera)
arch-snapshot/arch.md
line 784 at r4 (raw file):
Previously, dopiera (Marek Dopiera) wrote…
My intention is as follows:
- Provide rationale for why need a Broadcast operation in the first place - granted, this should end up in Networking #112.
- Emphasize that a naive approach would not work, hence we need something smarter, which is why we're introducing the concept of neighbors.
Given that 1. should end up in #112 perhaps we should leave only something along these lines:
Broadcasting information via a naive way, i.e. every Node contacting all other Nodes would be a drastic scalability limitation. Hence we need smarter ways of disseminating information. HybridNet's implementation is inspired by XYZ algorithm, which introduces the concept of neighborhoods.
Done.
arch-snapshot/arch.md
line 822 at r7 (raw file):
Previously, dopiera (Marek Dopiera) wrote…
That's not consistent with how we intended to use the word "broadcast", right? How about:
HybridNet implements the broadcast operation via sending broadcast messages to the nearest neighborhood...
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 files reviewed, 5 unresolved discussions (waiting on @dopiera)
arch-snapshot/arch.md
line 784 at r4 (raw file):
Previously, nieznanysprawiciel wrote…
Done.
Point 1 should be applied to the Network PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so all changes targeting this PR are finished. There 2 comments that should be applied in https://reviewable.io/reviews/golemfactory/golem-architecture/112
I would prefer to apply them after we will have unified version. So maybe let's merge those PR first and later I will create new one to add necessary things?
Reviewable status: 0 of 1 files reviewed, 5 unresolved discussions (waiting on @dopiera)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 files reviewed, 1 unresolved discussion (waiting on @nieznanysprawiciel)
arch-snapshot/arch.md
line 777 at r1 (raw file):
Previously, nieznanysprawiciel wrote…
I clarified point 3.
Hmm there is another implementation detail that I missed when it comes to question 2. There are actually 2 tcp connections per transport type (channel). One is used to send messages, other is used to receive messages from other party.
This is implementation detail, done for simplicity. Otherwise there are problems with synchronization. If 2 Nodes try to establish tcp connection at the same time, we end up with 2 connections anyway. We could avoid this, by having rules, which connection should be closed and which one should remain. This would be more error prone than current version.
In general there is much to lose and not much to gain.I'm wondering, should I describe it?
I wouldn't like to dive so deep into implementation.
Let's ignore it.
arch-snapshot/arch.md
line 794 at r10 (raw file):
types are distinguished by using separate TCP connections. This separation ensures that independent sender buffers are maintained, preventing messages in one channel from being blocked by messages in the other. Each transport type has its own single TCP connection. This means that independent transfers still can interfere with each other.
Sorry - dumb question. Are they even possible? I.e. is there some multiplexing on top of this TCP connection?
Code quote:
This means that independent transfers still can interfere with each other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 files reviewed, 1 unresolved discussion (waiting on @dopiera)
arch-snapshot/arch.md
line 794 at r10 (raw file):
Previously, dopiera (Marek Dopiera) wrote…
Sorry - dumb question. Are they even possible? I.e. is there some multiplexing on top of this TCP connection?
Yes, look at my answer in networking PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 files reviewed, 1 unresolved discussion (waiting on @nieznanysprawiciel)
arch-snapshot/arch.md
line 794 at r10 (raw file):
Following our F2F discussion - I think the word "transfer" creates confusion here. How about we replace this sentence with this?
If multiple components within the same process need to use the transfer channel, they will share a single TCP connection, competing for bandwidth. As there is no fair scheduling mechanism, one component can easily starve another by producing messages at a sufficiently high rate.
1accf0f
to
ffafcff
Compare
ffafcff
to
1accf0f
Compare
736829a
to
6015150
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r11.
Reviewable status: 0 of 1 files reviewed, 1 unresolved discussion (waiting on @nieznanysprawiciel)
Important
Chapter
Broadcasting and neighborhood
contains copy of fragments from description of market broadcasting. It should be removed in original place later, when all changes are merged and link directing to this chapter should be left.This change is