Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relay transactions mortality works incorrectly #1193

Closed
svyatonik opened this issue Oct 26, 2021 · 2 comments · Fixed by #1196
Closed

Relay transactions mortality works incorrectly #1193

svyatonik opened this issue Oct 26, 2021 · 2 comments · Fixed by #1196
Assignees
Labels

Comments

@svyatonik
Copy link
Contributor

Some message relay transactions are removed from the pool (just after being accepted) with the BadProof error. E.g.:

Relay:
[Millau_to_Rialto_MessageLane_00000000] �[1;38;5;8m2021-10-26 05:07:30 +0000�[0m �[38;5;14mDEBUG�[0m �[38;5;8mbridge�[0m Going to submit proof of messages in range 8..=11 to Rialto::MessagesDelivery node
�[1;38;5;8m2021-10-26 05:07:30 +0000�[0m �[38;5;12mTRACE�[0m �[38;5;8mbridge�[0m Prepared Millau -> Rialto delivery transaction. Weight: 1949647883/1299875000000, size: 2847/3932160
�[1;38;5;8m2021-10-26 05:07:30 +0000�[0m �[38;5;12mTRACE�[0m �[38;5;8mbridge�[0m Sent transaction to Rialto node: 0xc464985afb4fd4550bfc4da98056afcf927c59c8285151d4dddadb3e0facde96
[Millau_to_Rialto_MessageLane_00000000] �[1;38;5;8m2021-10-26 05:07:30 +0000�[0m �[38;5;14mDEBUG�[0m �[38;5;8mbridge�[0m Successfully submitted proof of nonces 8..=11 to Rialto::MessagesDelivery

Rialto node:
2021-10-26 05:07:30.568 TRACE tokio-runtime-worker txpool: [0xc464985afb4fd4550bfc4da98056afcf927c59c8285151d4dddadb3e0facde96] WaitingTransaction { imported_at: Instant { tv_sec: 669227, tv_nsec: 553069399 }, transaction: Transaction { hash: 0xc464985afb4fd4550bfc4da98056afcf927c59c8285151d4dddadb3e0facde96, priority: 1025, valid_till: 119, bytes: 2847, propagate: true, source: TransactionSource::External, requires: [], provides: [90b5ab205c6974c9ea841be688864633dc9ca8a357843eeacf2314649965fe2223000000], data: 840090b5ab205c6974c9ea841be688864633dc9ca8a357843eeacf2314649965fe2201389a4d940c891ac3eb479ccb8896d20d996c843d5ae5981b08ddef3726919d77867d35aee071d24d1db1db87400546056fa73699d3a16c6d79ca11564bdd768485038c00100590b5ab205c6974c9ea841be688864633dc9ca8a357843eeacf2314649965fe220dc07819e894fe0df843b1b96218a380ba3a7d5a9f613ab359493374fffe50da1a3e600921220f6e66379af4732c0c94bb39275edf9289f07ca5e211a00b7e2c28fd0177087fc20c9c609c5d21ef1a4916c2ac65000000000b00000000000000810159010100000080fdad0b0000000002306721211d5404bd9da88e0204360a1a9ab8b87c66c1bc2fcdd37f3c2222cc20009c0400001cbd2d43530a44705ad088af313e18f80b53ef16b36177cd4b77b846f2a5f07c02286beefdf1566d030000008d089f0a395e6242c6813b196ca31ed0547ea766990101ba5ca0b89c2beb7a15ab12f79b58cbe08c4290a755d76269eef32348676801a1e7058c426bf7f5b1da9a30535f56796b4380f58683037444d8ca496fc15f4c590101f47a6c3b65f37e53dcdb225a5d6af33578f06c73dc47ee956d6de9bc00a8da427cdc6f9285a019227a59b8a4a4d7ae1d4f4fe2dffdc5037d9b7201ad4530d2af010160184f5879263e380aca27706352598992f56814b64c30e6bf57...8f4cdf4bdcb04c5d4c03a102fc3123c777d401011e15d757379d7d6baab69b92e282b1a53e454440dfb02b49dac99c7cfcc2b1659c9ca766132f89a29d5e9fe961a2f9bfe0b63712890215f0f453e47a4538b2ef01013b3ca0e0cb5ddde7d185e2d9443a38ab0ce70c34010a69d7363a4f900d28f68b2690247dc96d1a32ab89fc79a1df25ac7ee8e143834e62ae36ed2c0ed1686c9b0101ae8c41a53832f0e4635ffcffd6b9eb66defc50648c5d20ba098dadb50851c37b7d6bf8e49e30f1276374b12f022760a76c1ae724dceef3306e0f3ef559b6906f0101d17d98e8425a0caca74107cd864767f7339756317a6394e4f29bc92104bb486714f25102c6fb75df4aaabf5fe049aa73be40c222e2afe3fa434b713ae815796c01010a06fead8d95321db607d4c72e5b8a31740be46ae40f0f21bb61869f5c64b563776342b66d2dbcdb3c73ad6438c1f8a7d66516f94a3360252aec4ac69582972801011343f4a982fb99fecbeb4b814b60a508c3f98331b21e7554dded2fffa4ac43f999ed5d099aa2c3780eace798a94aac5bae2ef9da79a4afd6751a9d56278c8a97010158a07eb27eb28fe3b3f3e49852e50edd92230b07318bedad987278dcf4925f05c78fc9de59107cecf3eab1259955265561a3eb899c465491b0aeaf72e84eddcb0000000008000000000000000b000000000000000400000070236e1700000000}, missing_tags: {}}    
2021-10-26 05:07:30.568 DEBUG tokio-runtime-worker txpool: [0xc464985afb4fd4550bfc4da98056afcf927c59c8285151d4dddadb3e0facde96] Importing to ready    
2021-10-26 05:07:30.568 TRACE tokio-runtime-worker txpool: [0xc464985afb4fd4550bfc4da98056afcf927c59c8285151d4dddadb3e0facde96] Ready (replaced with None)    
2021-10-26 05:07:30.568 DEBUG tokio-runtime-worker txpool: Pool Status: PoolStatus { ready: 1, ready_bytes: 2847, future: 0, future_bytes: 0 }    
2021-10-26 05:07:30.568 DEBUG tokio-runtime-worker rpc_metrics: [ws] author_submitExtrinsic call took 3442 μs    
2021-10-26 05:07:30.569 DEBUG tokio-runtime-worker rpc: Response: {"jsonrpc":"2.0","result":"0xc464985afb4fd4550bfc4da98056afcf927c59c8285151d4dddadb3e0facde96","id":3400}.    
2021-10-26 05:07:30.570 TRACE tokio-runtime-worker txpool: [0xc464985afb4fd4550bfc4da98056afcf927c59c8285151d4dddadb3e0facde96] Broadcasted    
2021-10-26 05:07:31.790 DEBUG tokio-runtime-worker txpool: Updated revalidation queue at 56. Transactions: {0xc464985afb4fd4550bfc4da98056afcf927c59c8285151d4dddadb3e0facde96: 56}    
2021-10-26 05:07:31.919 DEBUG tokio-runtime-worker txpool: [0xc464985afb4fd4550bfc4da98056afcf927c59c8285151d4dddadb3e0facde96]: Revalidation: invalid InvalidTransaction::BadProof    
2021-10-26 05:07:31.919 DEBUG tokio-runtime-worker txpool: Removing invalid transactions: [0xc464985afb4fd4550bfc4da98056afcf927c59c8285151d4dddadb3e0facde96]    
2021-10-26 05:07:31.920 TRACE tokio-runtime-worker txpool: [0xc464985afb4fd4550bfc4da98056afcf927c59c8285151d4dddadb3e0facde96] Removed as part of the subtree.    
2021-10-26 05:07:31.920 DEBUG tokio-runtime-worker txpool: Removed invalid transactions: [Transaction { hash: 0xc464985afb4fd4550bfc4da98056afcf927c59c8285151d4dddadb3e0facde96, priority: 1025, valid_till: 119, bytes: 2847, propagate: true, source: TransactionSource::External, requires: [], provides: [90b5ab205c6974c9ea841be688864633dc9ca8a357843eeacf2314649965fe2223000000], data: 840090b5ab205c6974c9ea841be688864633dc9ca8a357843eeacf2314649965fe2201389a4d940c891ac3eb479ccb8896d20d996c843d5ae5981b08ddef3726919d77867d35aee071d24d1db1db87400546056fa73699d3a16c6d79ca11564bdd768485038c00100590b5ab205c6974c9ea841be688864633dc9ca8a357843eeacf2314649965fe220dc07819e894fe0df843b1b96218a380ba3a7d5a9f613ab359493374fffe50da1a3e600921220f6e66379af4732c0c94bb39275edf9289f07ca5e211a00b7e2c28fd0177087fc20c9c609c5d21ef1a4916c2ac65000000000b00000000000000810159010100000080fdad0b0000000002306721211d5404bd9da88e0204360a1a9ab8b87c66c1bc2fcdd37f3c2222cc20009c0400001cbd2d43530a44705ad088af313e18f80b53ef16b36177cd4b77b846f2a5f07c02286beefdf1566d030000008d089f0a395e6242c6813b196ca31ed0547ea766990101ba5ca0b89c2beb7a15ab12f79b58cbe08c4290a755d76269eef32348676801a1e7058c426bf7f5b1da9a30535f56796b4380f58683037444d8ca496fc15f4c590101f47a6c3b65f37e53dcdb225a5d6af33578f06c73dc47ee956d6de9bc00a8da427cdc6f9285a019227a59b8a4a4d7ae1d4f4fe2dffdc5037d9b7201ad4530d2af010160184f5879263e380aca27706352598992f56814b64c30e6bf57...8f4cdf4bdcb04c5d4c03a102fc3123c777d401011e15d757379d7d6baab69b92e282b1a53e454440dfb02b49dac99c7cfcc2b1659c9ca766132f89a29d5e9fe961a2f9bfe0b63712890215f0f453e47a4538b2ef01013b3ca0e0cb5ddde7d185e2d9443a38ab0ce70c34010a69d7363a4f900d28f68b2690247dc96d1a32ab89fc79a1df25ac7ee8e143834e62ae36ed2c0ed1686c9b0101ae8c41a53832f0e4635ffcffd6b9eb66defc50648c5d20ba098dadb50851c37b7d6bf8e49e30f1276374b12f022760a76c1ae724dceef3306e0f3ef559b6906f0101d17d98e8425a0caca74107cd864767f7339756317a6394e4f29bc92104bb486714f25102c6fb75df4aaabf5fe049aa73be40c222e2afe3fa434b713ae815796c01010a06fead8d95321db607d4c72e5b8a31740be46ae40f0f21bb61869f5c64b563776342b66d2dbcdb3c73ad6438c1f8a7d66516f94a3360252aec4ac69582972801011343f4a982fb99fecbeb4b814b60a508c3f98331b21e7554dded2fffa4ac43f999ed5d099aa2c3780eace798a94aac5bae2ef9da79a4afd6751a9d56278c8a97010158a07eb27eb28fe3b3f3e49852e50edd92230b07318bedad987278dcf4925f05c78fc9de59107cecf3eab1259955265561a3eb899c465491b0aeaf72e84eddcb0000000008000000000000000b000000000000000400000070236e1700000000}]    
2021-10-26 05:07:31.920 DEBUG tokio-runtime-worker txpool: [0xc464985afb4fd4550bfc4da98056afcf927c59c8285151d4dddadb3e0facde96] Extrinsic invalid   
@svyatonik
Copy link
Contributor Author

It happens when there's a chain reorg - we're using best header when tx is submitted. If tx has been submitted when best header was A[num=100], then there's reorg to B[num=100], tx is revalidated and signature check fails, because header with num=100 has different hash. So it may be fixed simply by switching to using best-finalized-header, but it is dumb because finality may be lagging and if lag is larger than tx mortality, transactions may be invalidated instantly. So seems that it'll be better to use header H - N as the 'best header'. N may be hardcoded or specified by user. Will need some additional experiments

@svyatonik
Copy link
Contributor Author

Braindump: problem that is fixed by the mortal transactions = if relay is restarting (which may be triggered manually or because of its internal mechanisms), while there are his active transactions in the wild, they (transactions) are invalidated. Ideally - while relay restarts (which is true now for auto-restarts).

For fixing the same + auto-restarts we may use submit-and-watch RPC to see what happens with transaction. Then, if transaction is removed from the pool, we may restart relay loop. But there are some caveats:

  1. if transaction is replaced (e.g. by our resubmit mechanism), then we can't watch it anymore;
  2. transaction may be removed from the pool because it exceeds limits, but it may still live on other nodes (because it has been broadcasted before);
  3. if there are too many tx finality watchers (more than MAX_FINALITY_WATCHERS), we may miss the fact that the transaction block has not been finalized, then there was reorg and transaction was dropped because it has became invalid.

When using public RPC nodes, (2) and (3) may in theory be used by malicious actors to 'break' relay. So imo stall-timeout and transactions mortality is the way to go now - it covers (more or less) all possible restart/stall cases. At least we have a guarantee that if relayer has not seen state update until some block, then its transaction has been 100% lost and it won't lose its money by resubmitting the same tx (probably with higher tip).

===

If we accept this ^^^, then the next problem is described in previous comment. If BF is best finalized block, and BB is best known block, then we may use any block in the range [BF; BB] as the transaction era start.

Simply using BB for that may invalidate transaction even if one-block reorg happens. BB hash (known to relayer) is used to generate transaction signature && when the signature is checked, the hash is read from the runtime storage. When even one-block reorg happens, hashes will be different.

Using BF may invalidate transactions immediately if short mortality is used. E.g. having 4 blocks finality lag is fine. And having 4 blocks mortality is allowed. In such case, transaction will be invalidated immediately because its era has ended (tx death block = BF + 4 = BB).

We should select block in the [BF; BB] range with care - we can't select too old block, because it can't be larger than BlockHashCount configuration parameter of the system pallet.

===

If we accept this ^^^, then we shall also accept the fact that there's no perfect solution for that. And it seems to me that the best solution here is the simplest - instead of using BB, let's use BB.parent(). Since case described in this issue is rare and (my impression) more than one block in depth reorgs on Substrate-chain are rare, there's almost 0 chance to see this in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants