Allow nodes to (manually) recover from dreaded "Already Spent" error #3018

antiochp · 2019-09-03T15:52:58Z

We still have scenarios and edge cases where we see the dreaded "Already Spent" error in the logs and a node fails to accept new blocks.

Related #3016.

Assuming this is not a double spend attempt this indicates the local UTXO set was corrupted somehow. The UTXO set consists of -

the output MMR (the TXO set)
the output leafset (which leaf pos are unspent in the output MMR)

It would be really useful if a node could recover from this somehow.
Particularly a node being used to test dev branches etc (where we are maybe more susceptible to borking the data).

Potential steps to recover -

select a new "horizon" to rewind to (say 1440 blocks earlier)
request a fresh leafset from a peer for this horizon
rewind the txhashset back to this horizon and drop the leafset in place
sync blocks from this horizon

aka Minimal fast-sync given preexisting txhashset MMRs and header MMRs

This would not need to be a realtime or automatic resolution. Node operators could explicitly start a node in recovery mode or something similar - this should primarily be used by devs/qa messing with code on branches etc.

This should not be something that triggers automatically on regular nodes (risking bigger underlying issues getting hidden and not surfacing).

antiochp · 2019-09-03T16:01:31Z

Just had a related thought - maybe we do not need to ask peers for a fresh leafset.
We run a chain compaction approx every 24 hours.
We could potentially create a "backup" leafset as part of this and use this in a recovery scenario.

When we download a txhashset.zip as part of fast sync we receive a snapshot leafset like pmmr_leaf.bin.000011a354ff (with block hash as suffix).
During a local chain compaction we could archive off something similar which would give us a way to recover to a block height 24 hours earlier (or 48 hours earlier if we took the previous one etc.)

As long as we have full blocks in our local db for all blocks after that leafset snapshot/backup we should be able to rebuild our MMRs robustly locally.
And even if we don't have the full blocks locally we can request them from peers as long as we do not attempt to rewind beyond the archive/pruning horizon.

antiochp added the enhancement label Sep 3, 2019

This was referenced Sep 3, 2019

Invalid Root #3016

Closed

[WIP] Chain recovery mechanism #3019

Closed

quentinlesceller mentioned this issue Sep 5, 2019

The node stops synchronizing when following error occurs. #2495

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow nodes to (manually) recover from dreaded "Already Spent" error #3018

Allow nodes to (manually) recover from dreaded "Already Spent" error #3018

antiochp commented Sep 3, 2019 •

edited

Loading

antiochp commented Sep 3, 2019

Allow nodes to (manually) recover from dreaded "Already Spent" error #3018

Allow nodes to (manually) recover from dreaded "Already Spent" error #3018

Comments

antiochp commented Sep 3, 2019 • edited Loading

antiochp commented Sep 3, 2019

antiochp commented Sep 3, 2019 •

edited

Loading