Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit amendment flapping (fixes #4350) #4364

Closed
wants to merge 1 commit into from

Conversation

a-noni-mousse
Copy link
Contributor

@a-noni-mousse a-noni-mousse commented Dec 5, 2022

This commit implements a new amendment that follows one of the recommendations in #4350 to reduce the flapping of amendments when there is validator outage and support drops below 80% required majority.

The new threshold for deactivation of the countdown timer of an an amendment is to go below 65% support.

High Level Overview of Change

Context of Change

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Refactor (non-breaking change that only restructures code)
  • Tests (You added tests for code that already exists, or your new feature included in this PR)
  • Documentation Updates
  • Release

@nbougalis
Copy link
Contributor

Great, happy to assign this to you and delegate myself to be a reviewer.

Copy link
Contributor

@nbougalis nbougalis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the elegant and minimal solution. I think will work just fine to prevent casual flapping in case a validator goes offline at just the right moment.

It does not add any hysteresis and that may be something else to consider adding on top of this, but then again maybe this is sufficient on its own for all but the most extreme flapping cases and there's something to be said about keeping it simple.

src/ripple/app/misc/impl/AmendmentTable.cpp Outdated Show resolved Hide resolved
else if (!hasValMajority && (majorityTime != NetClock::time_point{}))
else if (
!hasValMajority && (majorityTime != NetClock::time_point{}) &&
!vote->passes(entry.first, vote->deactivationThreshold()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is right and should have the same behavior as the existing code, but someone else should triple-check the logic when the amendment being checked is unknown. Here's the logic: hasValMajority will be false; before this commit, we'd always enter this block. Now vote->passes will return false, and the ! will turn it to true putting us in this block, just like before.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the amendment is unknown, then how would majorityTime be non-zero? And if it is zero for unknown amendments, then how would this branch have been entered before the change?

@@ -74,7 +74,7 @@ namespace detail {
// Feature.cpp. Because it's only used to reserve storage, and determine how
// large to make the FeatureBitset, it MAY be larger. It MUST NOT be less than
// the actual number of amendments. A LogicError on startup will verify this.
static constexpr std::size_t numFeatures = 53;
static constexpr std::size_t numFeatures = 54;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should consider ways to remove this altogether. It's a bit of a workaround.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok but not in this amendment i think to keep the code small

Copy link
Collaborator

@thejohnfreeman thejohnfreeman Jan 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be updated to 55? Was another amendment added while this was being reviewed? CI fails on this number being wrong, and the diff says this new line is no longer a difference.

Comment on lines 526 to 547
// void
// doRound(
// uint256 const& feat,
// AmendmentTable& table,
// weeks week,
// std::vector<std::pair<PublicKey, SecretKey>> const& validators,
// std::vector<std::pair<uint256, int>> const& votes,
// std::vector<uint256>& ourVotes,
// std::set<uint256>& enabled,
// majorityAmendments_t& majority)
// {
// doRound(
// {feat},
// table,
// week,
// validators,
// votes,
// ourVotes,
// enabled,
// majority);
// }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this commented-out block just be removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

src/test/app/AmendmentTable_test.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@nbougalis nbougalis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@intelliot
Copy link
Collaborator

@a-noni-mousse can you resolve the merge conflicts? thanks!

@a-noni-mousse
Copy link
Contributor Author

@a-noni-mousse can you resolve the merge conflicts? thanks!

OK I have done the resolve

@thejohnfreeman
Copy link
Collaborator

I guess the downside of this approach is that if exactly one validator permanently withdraws support for an amendment that just barely has supermajority support, then it will not actually lose supermajority support until enough others withdraw their support too. I prefer the idea of tracking votes on-ledger, but I guess the community finds the risk of this approach acceptable.

Copy link
Collaborator

@thejohnfreeman thejohnfreeman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might need to update the feature count again: #4364 (comment)

@HowardHinnant
Copy link
Contributor

$ rippled --unittests
Logic error: More features defined than allocated. Adjust numFeatures in Feature.h.

@nbougalis nbougalis closed this Jan 4, 2023
@nbougalis nbougalis reopened this Jan 4, 2023
@nbougalis
Copy link
Contributor

My bad, with the close-reopen. Apologies.

Copy link
Collaborator

@mDuo13 mDuo13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am against this approach.

It puts more demand on the validators to be stably online all the time. If you're one of the swing votes against an amendment, it could gain the majority while you're having a brief outage, and sustain the lower threshold even after you come back, which is not how the system is intended to work.

This changes the threshold for activating an amendment to "65% and some luck" rather than "sustained 80%". If you were to describe this as simply lowering the threshold to 65%, I think people would rightly be against that; but this is even worse because it's "kind of 65%" but more complicated.

The correct fix for amendment flapping would be to change the activation threshold calculation so that it requires >80% of all validators on the node's UNL (ignoring Negative UNL) rather than >80% of validators currently participating in consensus. Then amendments would only "flap" when validators change their votes, which is the intended behavior.

@scottschurr
Copy link
Collaborator

I'm with @mDuo13 on this one. I had been writing up some numeric concerns with this pull request. But in light of @mDuo13's comment those numeric issues are irrelevant. Let's not make it harder to run a validator than it already is.

@thejohnfreeman
Copy link
Collaborator

I'm convinced by @mDuo13 too, but I will comment that using 80% of UNL vs seen validations will not fix the issue either. You'll flap on when you see a yea from 80% the UNL and off when one of those yeas misses a validation.

The only way to tell the difference between "validator A is no longer voting yea" and "validator A appears to be offline" is if you see a validation from A with no yea. We're not observing that distinction in the code. If we start to carefully make that distinction, and a yea-voting validator is temporarily offline, should we pretend it is still voting yea? For how long?

@scottschurr
Copy link
Collaborator

@thejohnfreeman, excellent point and question.

My personal take, without trying any implementation, is that a validator should be assumed to be holding the same vote as it held before until a new vote actually arrives from that validator. So if the validator is offline it is assumed to not change its vote. However, that may be hard to implement, I'm not sure. I'm hesitant to armchair quarterback an approach that has the right balance of voting stability vs additional complexity.

@nbougalis
Copy link
Contributor

@mDuo13, @thejohnfreeman: excellent points. I had not thought of that when reviewing the code. I'm thinking that this isn't ready for "prime time" as is.

@nbougalis
Copy link
Contributor

nbougalis commented Jan 7, 2023

To elaborate a bit: you can't track individual votes on ledger; whose votes would you track? You can't say "the UNL's" because there isn't one UNL or a requirement that my UNL completely overlaps with yours, and you'd need to reach consensus on which ones to include (and you can't include all of them!). It's for the same reason that you can't just track a "count" of yays or nays, or use the "size of the UNL" instead of the number of validations received.

I liked this 'dual threshold' option, but obviously it's problematic for the reasons described above. Given that, I think the best option here maybe the other option (which I think I originally proposed): adding some hysteresis to the "deactivation".

If an amendment has gained an 80% supermajority, but subsequently loses it at some round X, set a flag indicating that's the case. If it recovers the 80% supermajority, in round X+1, then all is well; simply remove the flag and assume that it never lost majority. If it is still below the threshold, then just assume the amendment lost majority.

The hysteresis can last for a single round or it can be extended to more.

This commit implements a new amendment that follows one of the
recommendations in XRPLF#4350 to reduce the flapping of
amendments when there is validator outage and support drops below
80% required majority.

A new FlappingAmendment vector is used to track amendments that
lose majority and add delay of one flag ledger.
@a-noni-mousse
Copy link
Contributor Author

I have changed the code ot use hysteria instead @nbougalis @scottschurr @thejohnfreeman @mDuo13 by adding new FlappingAmendment vector and when amendment loses a majority if it is not on that list we add and do nothing else. If it is in the list then we do the old code and this causes the delay on one round to the next flag ledger for deactivation.

// minimum number of votes needed to begin activation countdown
int activationThreshold_ = 0;

// minimum number of votes needed to continue activate countdown
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// minimum number of votes needed to continue activate countdown
// minimum number of votes needed to continue activation countdown

else if (!hasValMajority && (majorityTime != NetClock::time_point{}))
else if (
!hasValMajority && (majorityTime != NetClock::time_point{}) &&
!vote->passes(entry.first, vote->deactivationThreshold()))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the amendment is unknown, then how would majorityTime be non-zero? And if it is zero for unknown amendments, then how would this branch have been entered before the change?

Comment on lines +238 to +252
auto it = std::find_if(
majorities.begin(), majorities.end(), [&amendment](STObject const& o) {
return o[sfAmendment] == amendment;
});

if (it == majorities.end() && lostMajority)
return tefALREADY;

if (it != majorities.end())
{
const STArray& oldMajorities =
amendmentObject->getFieldArray(sfMajorities);
for (auto const& majority : oldMajorities)
if (gotMajority)
return tefALREADY;

majorities.erase(it);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first time an amendment loses majority, the program will reach line 251, remove the amendment from the sfMajorities array, and continue, presumably to add it to the sfFlappingAmendments array. The second time an amendment loses majority, the program will reach line 244 and return, never continuing to fix the sfFlappingAmendments array. Line 251 needs to be conditional on whether the amendment is in the sfFlappingAmendments array (which itself is conditional on the amendment).

There does not seem to be a new test for this new behavior. Have you added one?

@scottschurr
Copy link
Collaborator

@a-noni-mousse, I glanced over this pull request and, although it follows the suggestion by @nbougalis, I'm not completely sold on the approach. I'm investigating whether what @mDuo13 suggested is feasible: #4364 (review) If that approach is feasible, then I think that would be the better path.

So I'm not ignoring you. Your pull request has prompted further investigation. Thanks for your patience.

@nbougalis
Copy link
Contributor

@a-noni-mousse, I glanced over this pull request and, although it follows the suggestion by @nbougalis, I'm not completely sold on the approach. I'm investigating whether what @mDuo13 suggested is feasible: #4364 (review) If that approach is feasible, then I think that would be the better path.

I'm not sure I understand how what @mDuo13 is suggesting would work. He writes:

The correct fix for amendment flapping would be to change the activation threshold calculation so that it requires >80% of all validators on the node's UNL (ignoring Negative UNL) rather than >80% of validators currently participating in consensus. Then amendments would only "flap" when validators change their votes, which is the intended behavior.

How will this prevent flapping or does it constitute a "correct fix"? Remember, votes aren't persistent which means that an amendment can flap on and off if as few as 1 validator were to miss a single vote.

The problem just isn't the negative UNL; flapping was a problem before that. The problem isn't even that the threshold is calculated based on the number of trusted validators that participated in the vote; if the threshold was calculated against the size of a node's UNL, you'd actually have more flapping. See here.

The problem is that there's a very narrow window of voting and a validator needs to be active during that window for it to cast a "yes" vote.

Now, this can be solved in one of several ways:

  1. Make validator votes "sticky." That is, if a validator voted "yes" once, then that "yes" persists until the validator votes "no". That has several problems, some of which I've explained before. At a minimum, the votes need to be persisted. You can't persist them on the ledger itself, which means you need to persist them locally. But if you do that, you need a mechanism to check if the vote you've persisted is stale, and if you don't, you're calculating the threshold incorrectly.
  2. Add hysteresis, which the latest commit from OP does. This doesn't entirely prevent flapping, but it effectively 'extends' a grace period by "persisting" the majority (as opposed to persisting the vote) until the next ballot.
  3. Allow a validator to vote at any time prior to the tally. This is interesting but still won't entirely prevent flapping either.

@thejohnfreeman
Copy link
Collaborator

Here's an elaboration of my previous comment that I wrote in a conversation with @scottschurr:

If we switch to using the full UNL, and ignore the negative UNL (@mDuo13's suggestion), then the problem becomes that a missing validation is interpreted as a "no" vote. Both "yes" and "no" votes are possibilities for the intent of the missing validation, which means whatever assumption we make in its absence carries some risk of being wrong. A corollary is that any "fix" should focus on flapping less, because the goal of reflecting validators' true intentions is impossible to guarantee.

The risk of being wrong seems to be smaller if we assume that the vote is unchanged from the validator's previous vote, which means one fix is to forward fill votes. That requires keeping track of old votes, which I don't think we do currently. (It could be done on-chain if each validator was associated with an account and we introduced the capability of recording votes, but I'm not suggesting that we do that.) These are the "sticky" votes @nbougalis is talking about. We update these votes with every pre-flag validation to keep them fresh.

Another fix, the hysteresis @nbougalis suggested, will work to reduce flapping. It effectively forward fills "yes" votes (but not "no" votes) for one flag ledger and no more. It does not stop the flapping if the cause is a flaky validator voting "no"1. It becomes more likely to diverge from the validators' true intentions, though, if the outage lasts for more than one flag ledger.

Footnotes

  1. For example, assume you have 27 of 34 validators voting "yes", and one of the "no" voters disappears. 27 of 33 validators reaches majority. Then the "no" voter returns, and majority is lost two flag ledgers later. Flapping was not stopped in this situation.

@scottschurr
Copy link
Collaborator

I've investigated an approach to amendment flapping that I think would address the comment from @mDuo13: #4364 (review)

The new approach seems viable, so I've created a pull request for that approach: #4410

In cases like this I usually try to make a change that could be cherry-picked into the existing pull request. I didn't do so this time because there's very little in common between the two approaches. The only thing shared by the two pull requests is the name of the Feature. We'll let reviewers choose which approach they prefer.

Thanks for your efforts on this pull request, @a-noni-mousse. Even though I've submitted an alternative approach I very much appreciate your efforts here.

@HowardHinnant HowardHinnant removed their request for review March 20, 2023 20:44
@HowardHinnant
Copy link
Contributor

Removed myself as reviewer as I believe that the 4 common reviewers assigned to both this and the #4410 alternative are sufficient.

@intelliot intelliot requested review from ChronusZ and removed request for RichardAH April 26, 2023 16:30
@JoelKatz
Copy link
Collaborator

We could treat absent/unknown votes as votes not to change the status quo.

@intelliot
Copy link
Collaborator

Closing in favor of #4410

@intelliot intelliot closed this Oct 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: 📋 Backlog
Development

Successfully merging this pull request may close these issues.

Prevent amendment majority "flapping" (Version: 1.9.4)
8 participants