-
Notifications
You must be signed in to change notification settings - Fork 207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add runPolicy object to controller.run() #3460
Comments
My current API uses a https://github.com/Agoric/agoric-sdk/blob/3460-run-policy/packages/SwingSet/docs/run-policy.md I'm wondering about extensibility, though: how to let the app-provided I'm wondering what experience other folks have here. I could change the |
...
I'm not aware of any requirement to do so. cosmic-swingset and solo are the only clients; if/when we need to change them, we can change them, no? I'm not a fan of trying to predict the future. Let's not try to generalize until we have 2 or 3 examples of extending the API. |
...
No, let's avoid stringly-typed stuff, please. Let's get all the help we feasibly can from static checks. |
This allows the host application to control how much work `c.run()` does before it returns. Depending upon the policy chosen, this can be a count of cranks, or cumulative computrons used, with more details to be added in the future. closes #3460
This allows the host application to control how much work `c.run()` does before it returns. Depending upon the policy chosen, this can be a count of cranks, or cumulative computrons used, with more details to be added in the future. closes #3460
What is the Problem Being Solved?
The host applications that use swingset break up their computation into "blocks". Each block finishes with a state commitment to durable storage, followed by the release of all embargoed outbound messages. The externally-visible latency of a message is lower-bounded by the time it takes to perform all the computation in a block ("P").
Solo machines are free to end a block any time they like. Consensus machines must end the block the same way for all validator nodes, and additionally perform significant non-swingset work for each block (e.g. they run a consensus algorithm over the contents and consequences of the block). This extra work takes some amount of time: the default cosmos-SDK settings cause 6 seconds of voting to occur in between the functional transaction-processing time (P), leading to a block time (and minimum latency) of 6+P. Cosmos/Tendermint does not currently make it easy to do any transaction processing during the voting time, leading to a CPU utilization of
P/(6+P)
. P is therefore a tunable parameter which trades off throughput against latency.To achieve whatever target tradeoff the machine operator chooses, we'd like to stop processing cranks when their cumulative runtime has (roughly) reached P. However their wallclock runtime is not a deterministic function of the machine state (it depends upon the CPU speed, among dozens of other uncontrolled factors), so a consensus-based swingset cannot use it to make this decision. (A solo machine can and should, though). While we cannot measure wallclock time, we do get metering data for each crank, and we can feed this into an externally-developed model to estimate what the elapsed wallclock time would be on the slowest acceptable validator. When the model tells us that we've probably reached the target P time, we stop running cranks.
Currently, cosmic-swingset crudely approximates this model by simply running at most 1000 cranks before ending the block, by calling
controller.step()
up to 1000 times. We want to replace this with acontroller.run(runPolicy)
invocation. ThisrunPolicy
object can incorporate the model and tell SwingSet when to stop.Description of the Design
controller.run(runPolicy)
delegates directly tokernel.run(runPolicy)
. TherunPolicy
is fed information about each delivery, just after it finishes execution. For now, we'll just give it the metering results (computrons consumed) in a call torunPolicy.deliveryComplete(computrons)
. The return value will be a boolean:true
to keep going,false
to stop.controller.run
checks the policy after each delivery and exits the loop when it says stop, or when there is no more work left to do.The current 1000-crank behavior will be replaced with a
runPolicy
that simply countsdeliveryComplete
invocations. Once a suitable #3459 model is derived from the testnet slogfile corpus, we'll switch to a more sophisticated policy object that watches the cumulative computron count and is configured with a targetP
time.A solo machine will use a
runPolicy
that gets to look at a real clock, and simply runs until a target wallclock time is reached. Consensus machines must use a deterministicrunPolicy
(and the configured target P must also be part of consensus, perhaps controlled by some kind of governance mechanism).Security Considerations
Test Plan
In addition to unit tests that show the kernel respecting the policy's decisions, we also want to build a simulator. This simulator should take a policy object and a slogfile-derived table of all the deliveries that took place on a testnet run. From this, we want to see how the cranks would have been broken up into blocks if the chain had been using that policy, and then look at metrics like average block time and externally-visible latency.
I'm not sure we have enough information to actually build that simulator, though:
However, spontaneous activity (such as a timer wakeup event triggering block rewards), where no external machine is immediately interacting with the chain as a result of that activity, should be modelable accurately.
The text was updated successfully, but these errors were encountered: