Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] kademlia dht #129

Merged
merged 11 commits into from
Apr 18, 2019
Merged

[WIP] kademlia dht #129

merged 11 commits into from
Apr 18, 2019

Conversation

zaibon
Copy link
Contributor

@zaibon zaibon commented Feb 26, 2019

This is a bunch of code I started writing when I wanted to investigate the kadmelia module.
This doesn't completely comply with the peer discovery introduced at : 17c778d

After some discussion here how to tackle this PR, here is the list of task that needs to be done in order to be able to merge this:

What I've done so far:

  • update the kadmelia module
  • write some basic code and test to ensure kadmelia itself is working

@zaibon zaibon changed the title WIP peer routing WIP peer discovery Feb 26, 2019
@robzajac
Copy link
Contributor

Will be giving this a closer look soon. On a first skim, looks very thorough!

To comply with the interface in #123, I think we should first determine what constitutes a "service" (this was inspired by Go) and whether "service" maps to any existing py-libp2p notions (CC @alexh). The JavaScript interface, on the other hand, doesn't need this notion as a parameter.

@zaibon
Copy link
Contributor Author

zaibon commented Feb 27, 2019

From the go implementation I see that they convert the "service" into a Cid. This is not a notion that we already have in py-libp2p as far as I know. But I guess content identification is something we'll have to work on at some point.

I did some change in the code to make it comply to the 2 interfaces defined i #123. Everything is still a bit rough

Some points I'm not sure how to tackle:

  • how do we want to upgrade a basic host with a routed host. I've implemented something similar of what go-libp2p does, but not sure we want to do the same here.
  • currently I still need to "manually" start the dht server, would be nice it get started automatically like the rest of the network when we create a node

@zixuanzh
Copy link
Contributor

zixuanzh commented Mar 3, 2019

@zaibon great work! Looks like a very good start and agree that it's a bit rough, especially when we are initializing. I think a big part of that comes from adapting bmuller's kademlia implementation and jumping straight to peer discovery without solidifying peer routing and content routing. Here are my two cents.

  • A routed host should just be a host with a routing system and having to pass in node twice to initialize routed host seems non-ideal. May I suggest we add routing interfaces and KademliaPeerRouter will just implement the peer routing interface? In that case, KademliaPeerRouter doesn't need to take in a host upon construction.
  • We can extend peer discovery from content routing like what go did and implement peer discovery as KademliaPeerDiscovery perhaps. It is also confusing that js-libp2p uses a different peer-discovery interface and their kad-dht discovery service is a random walk.
  • For starting the kademlia server, I think it will be cleaner if we start it in new_node when the router option is passed in but I am not exactly sure how that will look yet.
  • Regarding Cid, agree that it is not the highest priority yet, we can just use some hash as the content id and what we have right now seems fine. It looks like there is a Python Cid implementation that we can potentially adopt.

I will ask either @robzajac or @alexh to take a look as well when they get the chance but everyone is AFK this week. Thank you!

@zixuanzh zixuanzh mentioned this pull request Mar 12, 2019
@alexh
Copy link
Contributor

alexh commented Mar 14, 2019

A routed host should just be a host with a routing system and having to pass in node twice to initialize routed host seems non-ideal. May I suggest we add routing interfaces and KademliaPeerRouter will just implement the peer routing interface? In that case, KademliaPeerRouter doesn't need to take in a host upon construction.

For starting the kademlia server, I think it will be cleaner if we start it in new_node when the router option is passed in but I am not exactly sure how that will look yet.

Regarding Cid, agree that it is not the highest priority yet, we can just use some hash as the content id and what we have right now seems fine. It looks like there is a Python Cid implementation that we can potentially adopt.

I agree with all these points that @zixuanzh made. I think after these changes are made we are close to being able to merge this.

Everything in this looks good though, it looks like we are on the right track with this! Thank you so much for your hard work on this @zaibon, let me know if theres anything I can help with here.

@zaibon
Copy link
Contributor Author

zaibon commented Mar 14, 2019

yep everything that has been said make sense.

There is one thing though I'm not sure about. It seems in the go implementation of the kadmelia module, they keep a list of "providers" (https://github.com/libp2p/go-libp2p-kad-dht/blob/master/dht.go#L50) which is used to do content and peer routing. This "providers" list is used managed with 2 RPC call ADD_PROVIDER and GET_PROVIDERS (https://github.com/libp2p/go-libp2p-kad-dht/blob/1fe2fd55b9a80e8623f034f006b3e83c51e93dc9/pb/dht.proto#L17)
It seems they 2 RPC method are not standard in a kadmelia dht as far as I could read in some paper. I also didn't find anything similar in our python implementation. Do you think I should implement something similar here before trying to go further ?

@zaibon zaibon mentioned this pull request Mar 14, 2019
3 tasks
@zixuanzh
Copy link
Contributor

@zaibon looks like we don't need it immediately and agree that it is not a standard kademlia dht method. We can add them back in later on if we see a use case for them.

@raulk
Copy link
Member

raulk commented Mar 24, 2019

Have a look at the libp2p Kademlia DHT spec (WIP) here: https://github.com/libp2p/specs/pull/108/files#diff-d1380783eeaa00e1c992decb33b995fdR219

I think of provider records as symlinks, or advertisements. Instead of storing the data associated with a key at a particular node, you store a link to the place where it can be found.

This is useful in IPFS if you have a, say, 4TB file. The node holding it will advertise the content in the region of the DHT where interested parties will look for it. Nodes wanting to fetch the file will then connect to the provider directly. Once they have it, they too will become providers of the same record.

@zaibon zaibon changed the title WIP peer discovery WIP: kadmelia dht Mar 26, 2019
@zaibon
Copy link
Contributor Author

zaibon commented Mar 26, 2019

@raulk Thanks for the clarification.
So from everything that has been said here, I think we really need to split this into small tasks.
I clean up the PR and remove everything that was a bit premature.

Also from what I understand, the first thing to do before being able to progress is to adapt the bmuller's kademlia implementation to match what ipfs/libp2 does. This include:

  • add the custom ADD_PROVIDER and GET_PROVIDERS rcp commands
  • modify how the kdamelia lib deals with nodes. Currently a node in the network is identify by a ip and a port. I think it should be replaced with a multiaddr address

@zaibon
Copy link
Contributor Author

zaibon commented Mar 26, 2019

I adapted the first comment of the PR to list all the sub tasks.

@alexh I think there is enough work for both of us here, so if you want to help out you're welcome. Let me know how you want to distribute the work

@alexh
Copy link
Contributor

alexh commented Apr 1, 2019

Hey @zaibon I think you might have more clarity into what a good work distribution would be.
Do you have any tasks you'd want our team to look into?

@raulk raulk changed the title WIP: kadmelia dht WIP: kademlia dht Apr 1, 2019
@zaibon
Copy link
Contributor Author

zaibon commented Apr 2, 2019

@alexh I've updated the first comment of the PR with the list of tasks that needs to done.
Unfortunately I don't have much free cycle at the moment, so I don't want to block this if someone has some time to work on it.

@robzajac
Copy link
Contributor

robzajac commented Apr 2, 2019

@zaibon thanks for your hard work on this so far. We're going to pick up on your tasks above over the next 2-3 weeks as we complete other internal milestones as well. Our goal is definitely to ship peer discovery by the end of those 3 weeks. Feel free to jump back in whenever you have the cycles. Thanks again!

@alexh alexh changed the title WIP: kademlia dht [WIP] kademlia dht Apr 3, 2019
@zixuanzh
Copy link
Contributor

zixuanzh commented Apr 9, 2019

@raulk do you mind taking a look to see if we are on the right track? there are also some discussions in #128 and #97

@alexh alexh marked this pull request as ready for review April 18, 2019 01:01
@alexh
Copy link
Contributor

alexh commented Apr 18, 2019

We plan to merge this PR and create issues outlining the roadmap on development of our Kademlia implementation.

@alexh alexh requested a review from zixuanzh April 18, 2019 01:04
Copy link
Contributor

@zixuanzh zixuanzh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, will create an issue on what needs to be done

@alexh alexh merged commit cce226c into libp2p:master Apr 18, 2019
@zixuanzh zixuanzh mentioned this pull request Apr 18, 2019
7 tasks
@ntninja ntninja mentioned this pull request Jul 26, 2019
16 tasks
@zaibon zaibon deleted the peer_routing branch September 18, 2019 18:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants