Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow nodes to (manually) recover from dreaded "Already Spent" error #3018

Open
antiochp opened this issue Sep 3, 2019 · 1 comment
Open

Comments

@antiochp
Copy link
Member

antiochp commented Sep 3, 2019

We still have scenarios and edge cases where we see the dreaded "Already Spent" error in the logs and a node fails to accept new blocks.

Related #3016.

Assuming this is not a double spend attempt this indicates the local UTXO set was corrupted somehow. The UTXO set consists of -

  • the output MMR (the TXO set)
  • the output leafset (which leaf pos are unspent in the output MMR)

It would be really useful if a node could recover from this somehow.
Particularly a node being used to test dev branches etc (where we are maybe more susceptible to borking the data).

Potential steps to recover -

  • select a new "horizon" to rewind to (say 1440 blocks earlier)
  • request a fresh leafset from a peer for this horizon
  • rewind the txhashset back to this horizon and drop the leafset in place
  • sync blocks from this horizon

aka Minimal fast-sync given preexisting txhashset MMRs and header MMRs


This would not need to be a realtime or automatic resolution. Node operators could explicitly start a node in recovery mode or something similar - this should primarily be used by devs/qa messing with code on branches etc.

This should not be something that triggers automatically on regular nodes (risking bigger underlying issues getting hidden and not surfacing).

@antiochp
Copy link
Member Author

antiochp commented Sep 3, 2019

Just had a related thought - maybe we do not need to ask peers for a fresh leafset.
We run a chain compaction approx every 24 hours.
We could potentially create a "backup" leafset as part of this and use this in a recovery scenario.

When we download a txhashset.zip as part of fast sync we receive a snapshot leafset like pmmr_leaf.bin.000011a354ff (with block hash as suffix).
During a local chain compaction we could archive off something similar which would give us a way to recover to a block height 24 hours earlier (or 48 hours earlier if we took the previous one etc.)

As long as we have full blocks in our local db for all blocks after that leafset snapshot/backup we should be able to rebuild our MMRs robustly locally.
And even if we don't have the full blocks locally we can request them from peers as long as we do not attempt to rewind beyond the archive/pruning horizon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant