-
Notifications
You must be signed in to change notification settings - Fork 210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sharded Telemetry Server #349
Conversation
adding frontend configmaps and envVars optimizing docker-compose and DockerfIle
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, the PR is very clean and looks good, maybe in the future we could "resultify" the APIs instead doing this much of unwrap.
I didn't review the logic super careful because it's already battle-tested, mostly style stuff ^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM. Good job here, I especially like the attention paid to docs&comments.
self.0 = time; | ||
} | ||
pub fn increment_by(&mut self, duration: Duration) { | ||
self.0 += duration; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is addition to Instant
saturating?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't assume so. This source is only actually used in tests, so if we overflow the time I think I'd be happy to chalk it up to programmer error and let it panic
set.insert("Polkadot"); | ||
set.insert("Kusama"); | ||
set.insert("Westend"); | ||
set.insert("Rococo"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add the common good parachains here as well? statemine
/statemint
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know enough to have an opinion on that, but I'd be happy to add them! I can't see any nodes from those on the current telemetry server yet; is that expected (I guess they aren't a thing just yet?)?
/// Check if the chain is stale (has not received a new best block in a while). | ||
/// If so, find a new best block, ignoring any stale nodes and marking them as such. | ||
fn update_stale_nodes(&mut self, now: u64, feed: &mut FeedMessageSerializer) { | ||
let threshold = now - STALE_TIMEOUT; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Saturating sub here might be best?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we ever hit an underflow error here we've done something very wrong, so I'd be happy to let it panic personally
There were a couple of warnings when generating the docs, trivial stuff. |
An async hyper+soketto+tokio (-actix) telemetry server, split into:
telemetry_core
binary which the UI connects to and receives information from to display, and shards connect to to relay information to it.telemetry_shard
binary which takes JSON feeds from nodes and sends them to thetelemetry
process to be aggregated and sent to UI feeds.The telemetry core process works by spinning up a single aggregator loop which receives messages from shards and feeds, and updates the node state or subscribes/unsubscribes feeds as necessary.
At some point:
I'll leave
master
alone for now (to avoid any automated deployment bits), and merge this into long livedsharded
branch which can accomodate other changes (benchmark/e2e test tooling and any deployment config) before we merge it to master.Manual testing
To try this thing out locally, you can do the following:
In the substrate repository:
Start up an "alice" node in one terminal and a bob node in another (pointed to where shard 1 will start).
Now, let's start two more nodes on a different chain and pointed to a different telemetry shard:
cargo run -- \ --tmp \ --chain dev \ --alice \ --port 30335 \ --ws-port 9946 \ --rpc-port 9935 \ --node-key 0000000000000000000000000000000000000000000000000000000000000001 \ --validator \ --telemetry-url 'ws://localhost:8002/submit 1' \ --name AliceDev
cargo run -- \ --tmp \ --chain dev \ --bob \ --port 30336 \ --ws-port 9947 \ --rpc-port 9936 \ --validator \ --bootnodes /ip4/127.0.0.1/tcp/30335/p2p/12D3KooWEyoppNCUx8Yx66oV9fJnriXwCcXwDDUA2kj6vnc6iDEp \ --telemetry-url 'ws://localhost:8002/submit 1' \ --name BobDev
In this repository+branch
On another three terminals (sorry about all the terminals..), run the following from within the
backend
folder of this repo:The main telemetry process that UIs will connect to:
A couple of shards to receive telemetry and forward it on:
And finally, on yet another terminal and in t he
frontend
folder of this repo, we'll start the UI up so that we can see what's going on:Now, visit http://localhost:3000 and watch the data start coming in as nodes connect to the started shards.
Have a go at killing shards, or the telemetry core, or nodes and see what happens.