Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in ABA protocol's use of Common Coin #59

Open
2 of 3 tasks
amiller opened this issue Apr 2, 2018 · 2 comments
Open
2 of 3 tasks

Bug in ABA protocol's use of Common Coin #59

amiller opened this issue Apr 2, 2018 · 2 comments
Assignees
Labels

Comments

@amiller
Copy link
Owner

amiller commented Apr 2, 2018

Thanks to Ethan MacBrough (from Ripple Research) for reporting a protocol design error in HoneyBadgerBFT. The error has to do with the use of Threshold-Signature-based Common Coin to instantiate the Common Coin dependency of the ABA routine. MacBrough correctly points out that a flaw in the proof is that the ABA protocol we use relies on a stronger common coin than the one we instantiate (in fact it relies on a common coin that is so strong it cannot be implemented), and in fact provides a concrete attack. Fortunately, the ABA protocol can be readily modified to accommodate our weaker common coin, and hence fix the protocol. The fix suggested by MacBrough is posted at the end of the message.

To close this issue, we need to deploy the following fixes:

  • This is a protocol design flaw, and the construction in the conference paper is currently incorrect. The protocol construction and proof must be updated.
  • The constructive attack proposed by MacBrough should be reflected in the test cases
  • the ABA implementation must be modified to reflect the new fix

Ethan's note is below:

I thought I should let you know that I found a (probably very minor in practice) issue that can make the binary agreement instantiation used in HoneyBadger potentially never terminate with a single Byzantine node when the network scheduler is actively adversarial. I realized the issue when I was working on Cobalt, wherein I fixed the issue by adding an extra message phase to each round. I just remembered that HoneyBadger uses the same binary consensus mechanism, so I should let you know about the issue.

The basic issue is that Moustefaoi et al.'s consensus algorithm requires a stronger common coin than the instantiation given by Cachin et al. can provide. Namely, they assume that the values of the common coin are completely independent of the state of EVERY honest node at the point where it queries the coin value. This is not guaranteed by Cachin's common coin, since the adversary can reconstruct the coin value after only some of the honest nodes have queried the coin, and then inject "bad state" into the other nodes.

Specifically, there's an attack that can be launched as follows. For the attack I assume the common coin has a threshold of at most 2f+1, where n=3f+1; From what I read, HoneyBadger uses a threshold of only f+1, so the attack still holds. I present the more general argument since 2f+1 is the maximum threshold that can be used (since otherwise f crashed nodes could block coin construction), so the attack shows that a simple modification of the threshold doesn't help.

Let S be the set of all nodes, with |S|=3f+1. Partition S into 4 subsets, A0,A1,B, and F; where |A0|=|A1|=|B|=f, and |F|=1. Also let A=A0\cup A1. Suppose every node in A starts out in round r voting v\in\{0,1\} and every node in B starts out voting \neg v. The node x\in F is Byzantine.

x sends BVAL(\neg v) to the nodes in A0 and BVAL(v) to the nodes in A1. Also, all votes from nodes in B are delivered to all nodes in A. Messages within A0 are delivered. Thus nodes in A0 see |B|+|F|=f+1 votes for \neg v; so all nodes in A0 broadcast BVAL(\neg v) and all nodes in A0 see |A0|+|B|+|F|=2f+1 votes for \neg v; so all nodes in A0 broadcast AUX(\neg v).

Then all messages within A1 are delivered, as well as the BVAL(v) messages from A0 to A1. Thus the nodes in A1 see |A0|+|A1|+|F|=2f+1 votes for v and broadcast AUX(v). After this all messages within A are delivered and x sends both BVAL(0) and BVAL(1) to every node in A. Thus every node in A broadcasts both BVAL(0) and BVAL(1) and sets bin_values=\{0,1\}.

Now all nodes in A broadcast their threshold shares over the coin, so since |A|+|F|=2f+1, the adversary can construct the random coin value s. The nodes in F send BVAL(\neg s) to all the nodes in B, and all the BVAL(\neg s) messages from nodes in A are delivered to all nodes in B. Thus all the nodes in B broadcast AUX(\neg s). Deliver all AUX(\neg s) messages; there are 2f+1 of them, since either every node in A0 broadcast AUX(\neg s) or every node in A1 broadcast AUX(\neg s). Thus all nodes in B see 2f+1 AUX(\neg s) messages and get to the end of the round with bin_values=\neg s. Thus the nodes in B continue to the next round voting \neg s while the nodes in A continue to the next round voting s. At this point all messages from the round are delivered, and the process repeats.

Ethan MacBrough suggests the following fix, which is also used in Cobalt

The exact fix I came up with is, after passing the AUX message round, everyone broadcasts CONF(bin_values) and then waits until there are n-t CONF(S) messages with S\subseteq bin_values. Clearly everyone can eventually get past this, just like the AUX message round.

Now suppose some node P_i gets past the CONF round with bin_values={v}. Then t+1 honest nodes must have broadcast CONF({v}). Thus for any other honest node P_j that gets past the CONF round, P_j must have received CONF({v}) from some honest node before getting past (since it waits for n-t CONF messages). But two honest nodes cannot submit CONF({0}) and CONF({1}) by the security guarantees of the AUX round, so this means the value v was determined before P_j got past the CONF step, and hence before P_j revealed its threshold share. Thus the adversary learns nothing about the coin value until after v is determined, so v=s with probability 1/2.

@afck
Copy link

afck commented Jul 29, 2018

As an optimization, after we've received N - f CONF messages, we should probably use the minimal value set as vals of which we know that at least one honest node put it into a CONF, to make termination as likely as possible?

I.e. if we sent CONF({v}) ourselves or we received f + 1 CONF({v}), we set vals = {v}. Otherwise vals = {v, w}. Then if the coin is also v, we can terminate. (Even if originally we had arrived at {v, w}, and sent CONF({v, w}).)

I wonder if it would be safe to use vals = {v} even if neither of those are true, but if in the meantime we have collected N - f AUX(v) messages (so now we wouldn't need the other AUX(!v) messages anymore, and realize we could have sent CONF({v})) — or would that reopen the doors to something like the original attack again?

Edit: …or would that actually make termination less likely, because we would be less likely to switch to the coin value, like possibly all the other good nodes?
If we can justify both {v} and {v, w} with the above arguments, should we use {v} only if the coin is v, and otherwise use {v, w} (i.e. switch)?
Probably there is no right answer, and the best strategy depends on the number of faulty nodes, and the current distribution of estimates. Something like: If the coin is not v, use whatever the majority of the CONF messages says? (Never counting invalid CONFs, of course, with values not in our own bin_values.)

Edit 2: And would it be worth doing the CONF round even in epochs with a fixed coin, just because it makes it more likely that we can terminate? It would increase the number of message rounds in the most optimistic case, but since ACS has to wait for all ABA instances, it is as slow as the slowest out of N ABAs.

If we go with the first rule — always use {v} if you can —, then if we received N - f CONF({v}) messages, we know that at least f + 1 correct nodes sent CONF({v}) and therefore will use vals = {v} and not switch to !v, even if the coin is !v. We therefore know that v will end up in bin_values in the next epoch eventually, and could send both BVAL(v) and AUX(v) right at the beginning of the next epoch, and put v into bin_values right away. If enough nodes do that, it would effectively remove one message round from the next epoch.

@xygdys
Copy link

xygdys commented Jan 14, 2022

There is a bug in the CONF Phase implement
https://github.com/initc3/HoneyBadgerBFT-Python/blob/e8bcbc081dfb5d1e7298039d47bbebf7048b8e62/honeybadgerbft/core/binaryagreement.py#L40

The message broadcast on line 40 should be values, which is the output of AUX Phase, instead of bin_values. Otherwise, the security guarantees of the AUX round would not take effect!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants