-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update 4: Tracking / Stats Tooling #4
Comments
Hey @nicola, I also ask if I could get your feedback on things. I'm adding benchmark/logging tools this week to get it to "Developer Ready" ™ and I'd love a few more eyes on it. Based on this libp2p/pubsub issue's comments, I thought you might have some interest. Feel free to take a look at things or add suggestions/features/inspiration for the benchmark tool that aren't mentioned above. Metrics:
Faults (getting to these after the initial benchmark tools):
Thanks for any input! |
Perfect! I will have a read very soon (from mobile now) |
Hi @gavinmcdermott, as always, awesome stuff! Giving you feedback on the things you asked:
It will be very important to guarantee correctness, as in, at least, in a small network (~50), we should be able to observe cleary the message propagation and make sure the code is sound. After that, it is about load, it is important to understand how the nodes behave when they get slammed with messages (memory usage + routed messages) Then, it would be top to know how many times a node sees a repeated message. Since our first implementation is floodsub, it is important that we are efficient and not slam the nodes with the same content over and over (they have a timecache of 30 seconds, which should be enough to avoid getting repeated messages).
*Aside from using the EventEmitters and exposed properties on each pubsub node (e.g.: this) any other suggestions for places/libs we might want hook into (anything in bitswap for example)? Because aside of the obvious ones like these, I recall you mentioning something about timestamps from incoming messages. It actually might be good if the monitoring systems get all the information through logs, we use
To start, just being able to see that subscriptions are being propagated and that edge nodes get the messages would be ideal. Then checking the time from publish -> routing -> subscription, would enable us to get proper metrics to improve speed and network forming. |
Thanks @diasdavid, I appreciate the notes! Will check out and apply. I'm cranking on this for the next few days to get it into your hands asap! Will keep you posted... |
Good news @diasdavid: This week I set to work on a creating few lightweight modules that now suit our purposes better! I developed a much cleaner view of things since I started building against your initial FloodSub implementation. I'll likely have some fun updates for Monday's call. I'll also have a lot more pushed tomorrow, but in the mean time, feel free to check out the updates below... In a nutshell, we now have the concept of a modular UpdatesModules are coming together as follows:
Benchmarking PrioritiesRegarding priorities, the above will allow us to benchmark the basics of message propagation through the test network. But after digging into the code, I find that some of deeper metrics like message repeats (and others) cannot be determined unless we open up an interface. For example: I fooled around with poking into a pubsub node's FeedbackLook forward to hearing any feedback—regardless I'll be running to get some wonderful sh*t up for Monday's |
Thanks to @ReidWilliams for the conversations and initial writeup.
Understanding testnet performance
There is a need for the testnet to benchmark the performance of their messaging implementation based on a variety of factors (compare similar algos over time, compare different algos, etc.).
The
libp2p-datalogger
Overview
Assumptions about how that the tracking/logging work:
libp2p
which implements a routing strategy.Features
Being updated to work with this initial implementation
libp2p-datalogger
will a simple module to make it easy to log arbitrary data during a testnet experiment. It should be useful for logging (and potentially viewing in a browser) the benchmarking data including status messages, event messages, and performance data.libp2p
/pubsub
node in a test network can create and use its owndatalogger
instance.log.publish()
command. Publishing the log creates a set of data objects for the published log. At the end of an experiment, this makes it easy to view individual and aggregate node performance data.datalogger
does not define what data should be logged or provide statistical methods on the data. It just logs messages and publishes them (to some other tool—eventually using IPFS?). Statistical methods (E.g.: computing a max, min, and average queue length) should be done in a processing step after the experiment.Questions
Specifically @diasdavid...
Specific initial stats: What are the most important things you want to know/learn about message propagation in these initial implementations?
libp2p versions:
Because you'll be swapping versions ofUpdate Since looking at the floodsub implementation, I'll use this for reference.libp2p
that have a specific messaging strategy built in, we'll need the ability to know which lib version and algo you're using.Secio and speed: libp2p's move to secio means we should use pre-generated keys; I wrote up a quick keygen tool. It is now working and I'm dropping in the pregen keys today. Something to note: Creating the network is now significantly slower as a result of using full keys...need to think about how we can reduce this
Entry points and Methods: I was thinking of:
*Aside from using the EventEmitters and exposed properties on each pubsub node (e.g.: this) any other suggestions for places/libs we might want hook into (anything in
bitswap
for example)? Because aside of the obvious ones like these, I recall you mentioning something about timestamps from incoming messages.Information at Scales: It’ll be useful to collect per node statistics. For tests with a large number of nodes (10k, 100k, 1M more?) it probably won’t be necessary to analyze data from all nodes, but it may be useful to analyze data from a random subset. Thoughts?
Update
Tues. Sept. 27, 2016
The text was updated successfully, but these errors were encountered: