-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug][Aptos Framework][API][Transaction][VM] Multisig v2 FIFO invariant facilitates DoS attacks #8411
Comments
We actually thought through this during the design and sec audit. If not consider this a bug or dos exploit as this is consistent with how other multisig systems on other chains are designed. Technically you can reject the spammy txs right? Why add a flush operation that needs voting when you can reject? We can introduce a batch reject tx if theres a worry of having to send many reject txs for the spammy proposals. |
@movekevin rejecting becomes prohibitively expensive in a DoS attack with many duplicate transactions (per above example attack), and if a batch reject has to write to thousands of transaction table entries then the gas costs could exceed the per-transaction gas limit, rendering the batch reject operation unusable. Hence the quasi-pointer overwrite I suggest above. Your thoughts? |
The attacker would need to pay gas as well and in fact more to create these txs. In my opinion, this is more of a convenience issue to mass reject txs. It's only a serious prob if attacker pays much less gas than mitigation. Plus gas is currently pretty low anyway. Im not against introducing a flush operation if it is a pressing use case to mass reject txs though |
As additional context, creating the transaction should be more expensive than just casting a vote because the transaction proposer is responsible for setting up all transaction-related states, including creating a new storage slot. We didn't end up profiling this during the security review, but I think that fact makes it probably harder to pull off in reality. Would be curious what the numbers look like, especially after the recent gas refactor. |
There's also a plan to add support for executing (or rejecting) a tx in a single tx with the signatures from owners passed in as args (#8412). This would also reduce the cost of rejecting + executing (all can be in done in a single tx). We could even add a batch function for this which would allow rejecting multiple txs in a single tx, which would be roughly equivalent to the proposed flush function above |
@movekevin @chen-robert indeed the attacker will incur the more expensive per-item global storage costs to open up a table entry for each bogus transaction, but any such gas costs may be rendered negligible depending on specific economic incentives: in the example I gave above, the opportunity to simultaneously short and cripple a successful protocol is well worth even tens of thousands of dollars of gas costs. In reality the relative amounts to DoS or to mitigate might differ from what I present above, but the determinant for whether an attack is profitable or not is ultimately the associated TVL/treasury/market cap to be exploited. As for bulk rejection transactions or multi-agent transactions per #8412 , these approaches could potentially reduce the number of operations required to clear out a backlogged queue but they are still limited by per-transaction limits. Even in the case of a multi-agent bulk rejection (combining both theoretical approaches), there is an upper limit on the number of loopwise table operations and function calls that can be affected in a single transaction, due to per-transaction gas limits. Hence bulk rejections might only allow, say, 1000 proposals at a time, and if this is the case then honest signatories are faced with a significant coordination problem requiring a non-negligible level of technical sophistication (estimate max bulk rejection size, create a loop to make proposals, have all signatories operate in concert, etc.). Meanwhile the attacker is adding to the backlog (transfer Another potential disadvantage of forcing DoS transactions to pass through the FIFO flow is that it could become a bit of an indexing nightmare to catalog legitimate multisig activity: nodes might prune all events prior to the DoS attack, inspecting the execution logs would yield a wall of bogus activity, etc. Perhaps most perniciously, however, any such DoS attacks will eat into global storage and contribute to chain bloat. Again, with significant enough economic upside on a potential attack, a malicious attacker may be willing to submit even tens of gigabytes of DoS content. The flush operation I propose above mitigates these risks, in a manner that I believe completely deters DoS vectors: if honest signatories can sidestep the FIFO queue and reset the pointer to the head of the queue with a single operation, it is essentially guaranteed that an attacker is wasting money by trying to clog the queue. This holds true regardless of relative gas amounts. I'm glad to submit a draft PR on the Move code side of things to stimulate further discussion. |
Proposing a DoS transaction costs less than rejecting the transaction when the payload is sufficiently large, per the following testnet demonstration:
Hence it costs less to propose the attack transaction than it does to get the required two rejection votes (not to mention the rejection execution transaction), and the ratio of reject cost to proposal cost grows linearly with the number of signatures required: a 4-of-7 multisig would have to pay twice as much gas to reject compared with the gas required to propose a 50 kilobyte transaction, for example. Hence my earlier suggestion to simply change the header to the pointer of the queue, since a savvy attacker will almost necessarily send large transactions that are expensive to reject in bulk. |
Thanks for the detailed analysis, @alnoki. Although I do still think this attack is possible, I'm not sure why an attacker would do this in practice. If the goal of the exploit is to withdraw funds, they could have already done so immediately. If the attacker only gains access to one key in a multisig that has signature threshold > 1, spamming the multisig would cause annoyance and some gas costs to the owners of the multisig, but that would not be a significant amount and doesn't seem to benefit the attacker at all in anyway. With that said, I do agree with adding more convenient functions for owners to clear the queue of invalid transactions (which was somewhat of a pain point with Gnosis Safe, the most popular multisig wallet product on EVM chains) and perhaps exploring the ability to execute out of order. However, I just want to make sure we think this through and design it well as out-of-order execution, especially when it allows changing adding/removing owners in one go, can be dangerous. This could allow attackers to gain faster control over the multisig for example. Furthermore, the gas cost associating with rejecting and remove-rejecting transactions are likely associated with writing the simple map entries as changing anything there counts as writing it entirely. And removing a storage slot doesn't currently receive any gas refund and may well do in the future. Thus, I think the flush operation proposed here would still increase very high gas cost. IMO, it could be simpler to just offer a bulk reject function, where owners can provide signed messages certifying they want to remove txs up to a given transaction_id. This function should not do anything else (add/remove owners) and instead only clears the queue. This would greatly simplify things as we'd not need to specifically store this flush transaction on chain (making the data structure more complex). In the near future, once multisig_account also supports adding and executing a tx in one go by providing all the required signatures, this would be a nice consistent UX. |
@movekevin if an attacker can profit $500k USD by shorting a protocol then DoSing the queue to prevent critical operations, it it is worth it to spend $400k USD in gas (which would require $800k to mitigate in a 4-of-7 in the current design).
Agreed.
Agreed on the rejection vote transaction, but not for the remove-reject transaction: after the second required rejection vote, the remove-reject transaction for the above examples costs 0.00001
I am confused about this conclusion, as the functionality I propose here and in #8424 does not write to any simple maps. Are you suggesting that high gas costs will be incurred when overwriting DoS transactions, after the head of the queue is reset? I do not this would be the case because I would expect the write costs to be assessed on the write set (small legitimate payload when overwriting large DoS payload).
I am opposed to this, as off-chain signature assemblage is cumbersome (e.g. "hey everyone in this group chat, will you please reply to me with a signed blob for the BCS blob I'm sending over?" vs "hey everyone in this group chat please vote on transaction 3"). The advantage of Multisig v2 is that signature assemblage is on chain.
It sounds like this approach would simply increment
If the owner schema can't be updated at the same time as the queue flush, then what is to stop an attacker from immediately submitting more DoS transactions after the flush transactions, blocking owner schema modifications? |
The flush operation prototype you have currently does not update the records regarding how many owners agreed to "flush" a pending transaction. If it was to update this, this would lead to the same gas cost incurred by reject_transaction
Storing the flush operation on chain can be very complex as it needs a lot of safeguards such as a time bound. Without it, these txs can potentially be executed much later than intended and thus wipe out more txs than originally intended. I think some of the pain points can be abstracted away with a UI and/or CLI, and instantly executing the flush would be both simpler and safer and can lead to few edge cases due to the out-of-order execution cadence
Here's a prototype PR I just quickly wrote up: #8523 |
@movekevin the prototype is submitted as a standard transaction proposal payload or payload hash, and hence the votes are tabulated against the ensuing
Good point. I can update the PR to include a time bound.
Here I would just want to ensure there is not off-chain assemblage required, since the process is cumbersome.
I've commented there accordingly. I'm thinking even moreso now that what I suggest won't be so different from what you suggest if, instead of resetting the queue head by decrementing |
@alnoki , @movekevin , @chen-robert , to address the DoS attacks without necessitating out-of-order execution, I'd like to propose setting a cap on the transaction queue size, MAX_PENDING_TXNS, at a practical limit like 100 or 1000. Thus, we enforce that
The effort required in steps 4 and 7 is capped by O(MAX_PENDING_TXNS), making the task manageable. We can further simplify them by supporting batch removals and/or off-chain signature execution/rejection (#8412). |
@junkil-park generally I think this approach solves the major problem, however I think there are some important UX considerations:
Hence per the simplified schema I am suggesting:
How does this sound? |
@alnoki , that sounds good!
Just to clarify, are you suggesting to have some designated entry functions for these operations? |
@junkil-park yes, exactly. Per above flow, I think this would require the following function signatures: /// Vote on multiple txns in a batch
public entry fun vote_transactions(
owner: &signer,
multisig_account: address,
starting_sequence_number: u64,
final_sequence_number: u64,
approved: bool,
) /// Vote-reject first DoS txn, enqueue eviction txn.
public entry fun prepare_flush(
owner: &signer,
multisig_account: address,
starting_sequence_number: u64,
final_sequence_number: u64,
/// These last args should be passed to update_owner_schema(), the eviction txn
new_owners: vector<address>,
owners_to_remove: vector<address>,
optional_new_num_signatures_required: Option<u64>,
) /// Vote-reject all pending txns, vote execute eviction txn (assumed at end of txn queue)
public entry fun execute_flush(
owner: &signer,
multisig_account: address,
starting_sequence_number: u64,
) |
@alnoki , Could you elaborate a bit more about the experiment and unit test that are suggested above ^^? |
@junkil-park For the proposed flow to work, it must be guaranteed that Thus I am suggesting setting Does this help? |
@alnoki , it seems to be impossible to implement the function |
@junkil-park Noted that execution should be a separate txn because it is a special txn type, hence I suggest implementation as close as functionally possible to the above prototypes |
@alnoki , I made a draft implementation: In my draft PR,
I think these can be good building blocks for DoS mitigation, if not sufficient. We can improve the UX in M-Safe and CLI for a better experience for recovery from DoS. Please let me know what you think. |
@junkil-park thanks for tagging me. I've left a review with a breakdown of what this looks like in practice: |
@banool @chen-robert @davidiw @JackyWYX @jjleng @LeviHHH @lightmark @movekevin @wrwg @zekun000
Executive summary
Presently, the Multisig v2 paradigm added in #5894 enforces a first-in-first-out (FIFO) queue for pending multisig transaction proposals: on-chain multisig transaction proposals must be either executed or rejected in order of submission (sequence number).
This invariant facilitates a denial-of-service (DoS) attack vector whereby a malicious signatory, or an attacker who has access to the compromised private key of an honest signatory, can flood the transaction queue with malicious transactions and essentially gridlock multisig activity. To mitigate this attack vector, a flushing operation is proposed, such that an approved flush transaction can be executed out-of-order and reset the sequence number flow.
Example attack
Ace, Bee, Cad, Dee, and Edd are owners of a 3-of-5 multisig v2 account that controls:
CoinStore
with $1 M ofCoin<AptosCoin>
(APT
)PRO
) with a market cap of $100 M.The multisig has executed fifteen legitimate transactions and has no pending transactions:
aptos_framework::multisig_account::MultisigAccount.last_executed_sequence_number
= 15aptos_framework::multisig_account::MultisigAccount.next_sequence_number
= 16Each signatory has only $50 k worth of
APT
.An attacker compromises Edd's private key and uses it to rotate the authentication key on Edd's
aptos_framework::account::Account
, sends in an additional $200 k ofAPT
to0xedd...
, opens up a short position onPRO
, then prepares to DoS:APT
holdings0xace
0xbee
0xcad
0xdee
0xedd
The attacker then attempts to seize control of the entire multisig by sending the same transaction proposal ad infinitum:
remove all signatories from the multisig other than
0xedd
. Minutes later Ace notices an attack is underway, but at this point the attacker has already submitted hundreds of thousands of transactions using a bot, exhausting tens of thousands of dollars of gas in the process. Ace quickly sends in a transaction to remove0xedd
from the account, and the attacker starts proposing a different transaction: send allAPT
to0xedd
. The FIFO queue is now as follows:0xbee
0xedd
0xedd
0xedd
0xedd
0xedd
0xedd
0xedd
0xedd
0xedd
0xedd
0xace
0xedd
from the multisig0xedd
APT
to0xedd
0xedd
APT
to0xedd
0xedd
APT
to0xedd
At this point, due to the FIFO invariant, the following steps are necessary just to prevent the attacker from proposing more
APT
transfers:aptos_framework::multisig_account:execute_rejected_transaction
for each such rejection, thereby incurring additional gas costsNote that the attacker is a well-financed savvy programmer who has:
APT
by rejection 200,000APT
in linked on-chain accounts to cover rejections up through proposal 275,000Even if the honest signatories can cobble together enough funds and computational resources to coordinate 300 k transaction rejections, they will still then have to deal with the backlog of
APT
transfer proposals starting at sequence number 300,003:having exhausted their personal funds they appeal to the community for funding to cover the gas costs, and a bout of
PRO
panic selling ensues.Having opened up a massive short position on
PRO
before the attack, the attacker easily re-coups all gas funds expended during the attack and profits handsomely.Proposed mitigant
A flush transaction with the following Move function signature:
This mitigant requires that malicious transactions in the queue be overwritten over time, rather than during the flush operation, because a flush operation resulting in 300 k table operations will exhaust per-transaction gas limits. Hence in this solution, the flush operation essentially overwrites the pointer to the head of the queue, such that transactions enqueued after the flush overwrite DoS proposals.
Note that this flush operation will require a violation of the FIFO invariant, likely through a new transaction type that incorporates a nested
Multisig
type defined ataptos-core/types/src/transaction/multisig.rs
Lines 8 to 16 in 5f056f4
The proposed new transaction type is as follows:
The VM will also have to be updated to support this new transaction type, per the following:
MultisigTransactionPayload
corresponds to aaptos_framework::multisig_account::flush_queue_and_update_owner_schema
invocation, for both full-payload and hash-only proposals, at the specified sequence numberThe text was updated successfully, but these errors were encountered: