Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify definition of "network health." #4729

Merged
merged 5 commits into from
Oct 24, 2023
Merged

Conversation

mtrippled
Copy link
Collaborator

High Level Overview of Change

The existing documentation describes network health at a very high level with no nuance that reflects the reality. This update better defines network health as well as provides context about related factors.

Context of Change

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Refactor (non-breaking change that only restructures code)
  • Tests (You added tests for code that already exists, or your new feature included in this PR)
  • Documentation Updates
  • Release

For consensus to be considered healthy, the peers on the network
should largely remain in sync with one another. It is particularly
important for the validators to remain in sync, because they must
be in sync to participate in consensus. Another factor to consider is
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would avoid repeated "in sync" and instead use "... validators to remain in sync in order to participate in consensus."

observations. However, some factors, such as transactions volumes,
can increase consensus duration. This is because rippled performs
more work as transaction volume increases. Under sufficient load this
tends to increase consensus duration.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would add "time" here "Under sufficient load this time tends to ...", because we refer to "the time" in the sentence below which is now pushed far from the reference to consensus duration time.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duration is more precise, though "time" as used here means duration among other things. So duration is more appropriate here and I changed other instances of "time" to reflect that.

@intelliot intelliot added the Documentation README changes, code comments, etc. label Sep 27, 2023
@mtrippled
Copy link
Collaborator Author

@ximinez I'd like to cover this PR with you next week, please. There are some things about fees that could probably be clarified a bit about network health. Thanks for reviewing this. Let's not merge this until then.

Also, @Bronek I'm doing something here that's generally not OK--making changes to a PR after it's submitted. But this is a small PR, just for documentation. Anyway, normally what we do is submit a PR once we think it's feature complete, and only make changes based on review suggestions. So, "do as I say not as I do," please. :-)

@intelliot
Copy link
Collaborator

I'm doing something here that's generally not OK--making changes to a PR after it's submitted. But this is a small PR, just for documentation. Anyway, normally what we do is submit a PR once we think it's feature complete, and only make changes based on review suggestions.

I think most PRs are actually a little more flexible on this point - it's fine to make (justified) changes to a PR after it's submitted, but it does mean that the PR will generally need re-review/re-approval before merging. That is perfectly OK though.

@intelliot intelliot marked this pull request as draft September 28, 2023 18:59
@intelliot
Copy link
Collaborator

Given

Let's not merge this until then.

I've set this PR to "draft" status to ensure it isn't merged until deemed ready.

@mtrippled
Copy link
Collaborator Author

@Bronek @HowardHinnant @ximinez @intelliot I just refined the document a bit more, and fixed a typo. Please scan again.

often coincides with new ledgers with zero transactions.
A variety of factors contribute to consensus health.

Note that this is not necessarily the duration between
ledger closings, as consensus usually starts some amount of time after
a ledger opens.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this sentence be moved up ? It seems disconnected here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just removed the sentence entirely. It's not a useful detail here.

@intelliot intelliot marked this pull request as ready for review October 15, 2023 02:47
@intelliot
Copy link
Collaborator

note: this PR has changed since ximinez's last review, so it needs a re-review.

@intelliot intelliot added this to the 2.0 milestone Oct 15, 2023
Copy link
Collaborator

@ximinez ximinez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with this as it is, but I think it could be better with a little clarification of the last sentence.

src/ripple/app/misc/FeeEscalation.md Outdated Show resolved Hide resolved
duration should be roughly 20 seconds. That is far above the normal.
If the network takes this long to close ledgers, then it is almost
certain that there is a problem with the network. This circumstance
often coincides with new ledgers with zero transactions.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the consensus process takes >20 seconds, although no transactions were included in the ledger.

Can we list any factors that might cause this issue? Historically, have such problems occurred on the mainnet or other affiliated blockchain networks? Can we provide a link to such an example?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That goes beyond clarifying what stability is and gets into speculation and diagnostics.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel giving such examples would provide more clarity. As it stands, the reader does not understand why the network could become unstable

should largely remain in sync with one another. It is particularly
important for the validators to remain in sync, because that is required
for participation in consensus. However, the network tolerates some
validators being out of sync. Fundamentally, network health is a
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use the Reliability Score as a proxy for measuring the network health? It seems to indicate the degree of similarity in the calculations between validators.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what that is, and it brings me to a site that asks for my email address.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, the link is wrong. Here's the reference: https://xrpl.org/negative-unl.html#reliability-measurement
image

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ckeshava
The problem with the existing document is that it can be mis-interpreted to mean that 5s latency in consensus is some extreme upper limit, beyond which the network is in a faulty state. This PR corrects the language and hopefully encourages approaching the issue with some nuance. I didn't really intend this to be an exhaustive treatment of all the ways that the network can have problems, or different diagnostics and measurements that can be done. That is actually quite a sizable topic. But for now I prefer that this stays concise and mainly clarifies the original statement.

@intelliot
Copy link
Collaborator

@ckeshava for the ideas + open questions that you have, please feel free to open a new issue (or better - a PR with your proposed changes). They are likely outside the scope of this particular PR

@intelliot intelliot merged commit 3e5f770 into XRPLF:develop Oct 24, 2023
16 checks passed
sophiax851 pushed a commit to sophiax851/rippled that referenced this pull request Jun 12, 2024
Update the documentation to describe network health with more nuance as
well as context about related factors.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation README changes, code comments, etc.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

6 participants