Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify definition of "network health." #4729

Merged
merged 5 commits into from
Oct 24, 2023
Merged
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 24 additions & 3 deletions src/ripple/app/misc/FeeEscalation.md
Original file line number Diff line number Diff line change
Expand Up @@ -190,9 +190,30 @@ lower) fee to get into the same position as a reference transaction.

### Consensus Health

For consensus to be considered healthy, the consensus process must take
less than 5 seconds. This time limit was chosen based on observed past
behavior of the network. Note that this is not necessarily the time between
For consensus to be considered healthy, the peers on the network
should largely remain in sync with one another. It is particularly
important for the validators to remain in sync, because that is required
for participation in consensus. However, the network tolerates some
validators being out of sync. Fundamentally, network health is a
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use the Reliability Score as a proxy for measuring the network health? It seems to indicate the degree of similarity in the calculations between validators.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what that is, and it brings me to a site that asks for my email address.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, the link is wrong. Here's the reference: https://xrpl.org/negative-unl.html#reliability-measurement
image

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ckeshava
The problem with the existing document is that it can be mis-interpreted to mean that 5s latency in consensus is some extreme upper limit, beyond which the network is in a faulty state. This PR corrects the language and hopefully encourages approaching the issue with some nuance. I didn't really intend this to be an exhaustive treatment of all the ways that the network can have problems, or different diagnostics and measurements that can be done. That is actually quite a sizable topic. But for now I prefer that this stays concise and mainly clarifies the original statement.

function of validators reaching consensus on sets of recently submitted
transactions.

Another factor to consider is
the duration of the consensus process itself. This generally takes
under 5 seconds on the main network under low volume. This is based on
historical observations. However factors such as transaction volume
can increase consensus duration. This is because rippled performs
more work as transaction volume increases. Under sufficient load this
tends to increase consensus duration. It's possible that relatively high
consensus duration indicates a problem, but it is not appropriate to
conclude so without investigation. The upper limit for consensus
duration should be roughly 20 seconds. That is far above the normal.
If the network takes this long to close ledgers, then it is almost
certain that there is a problem with the network. This circumstance
often coincides with new ledgers with zero transactions.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the consensus process takes >20 seconds, although no transactions were included in the ledger.

Can we list any factors that might cause this issue? Historically, have such problems occurred on the mainnet or other affiliated blockchain networks? Can we provide a link to such an example?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That goes beyond clarifying what stability is and gets into speculation and diagnostics.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel giving such examples would provide more clarity. As it stands, the reader does not understand why the network could become unstable

A variety of factors contribute to consensus health.
ximinez marked this conversation as resolved.
Show resolved Hide resolved

Note that this is not necessarily the duration between
ledger closings, as consensus usually starts some amount of time after
a ledger opens.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this sentence be moved up ? It seems disconnected here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just removed the sentence entirely. It's not a useful detail here.


Expand Down
Loading