-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reorg resistance #2320
Reorg resistance #2320
Conversation
@victorkirov if you want to have a look here :) |
@raphjaph looks good overall. Some things to consider though:
|
Outside of an attack scenario the likelihood of a more than 6-block reorg are extremely low, and in Bitcoin's history we haven't seen it. But taking a snapshot only every few blocks seems like a good tradeoff. See https://blog.lopp.net/how-many-bitcoin-confirmations-is-enough/ for the math behind worst-case scenarios in an attack.
I'll do some more measurements, but I haven't seen significant memory impact. |
True, the only time there has been a reorg with a lot of blocks was during a fork. Having the status there would be in case something like that happens, otherwise, I think it would currently silently fail to continue indexing.
Strange. It's possible that I did something wrong 😅 I need to read up on how the savepoints work a bit more. |
…5. This gives us a savepoint every 5 blocks when initial indexing is completed, and every 5000 blocks during initial indexing. Only keep 2 savepoints, as that allows us under this scheme to go back by 10+ blocks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed and tested extensively. (Partially) incorporating suggestion from Victor.
How much RAM do you have on the machines in the cluster? This may be an issue in more memory constrained environments. |
We had 32Gb with a bitcoin node, electrs API and 2 instances of Ord running. I upped it to 64Gb and seems a lot more stable now. |
src/index/updater.rs
Outdated
let savepoints = wtx.list_persistent_savepoints()?.collect::<Vec<u64>>(); | ||
|
||
if savepoints.len() >= 2 { | ||
wtx.delete_persistent_savepoint(savepoints.into_iter().min().unwrap())?; | ||
} | ||
|
||
Index::increment_statistic(&wtx, Statistic::Commits, 2)?; | ||
wtx.commit()?; | ||
let wtx = self.index.begin_write()?; | ||
log::debug!("creating savepoint at height {}", self.height); | ||
wtx.persistent_savepoint()?; | ||
wtx.commit()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just 2 things to note here:
- This first checks if there are more than 2 savepoints and deletes extra ones, then it creates a new one. That means that at the end you'd end up with 3 savepoints until it runs again. Maybe it should first create a savepoint and then do the greater than or equal to 2 check and clean.
- Since this is in the commit function, it will only run on a full commit of an update run and could potentially contain more than 1 block in the commit. In that case, the %5 check could potentially skip over a savepoint, but this would only realistically happen on initial indexing, or starting up after being down for a few blocks, or after recovering from a reorg.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- This first checks if there are more than 2 savepoints and deletes extra ones, then it creates a new one. That means that at the end you'd end up with 3 savepoints until it runs again. Maybe it should first create a savepoint and then do the greater than or equal to 2 check and clean.
If first checks if there are 2 or more checkpoints, and if so, deletes the oldest one. That means if somehow we ever get to more than 2 checkpoints, we would never reduce them down to 2. But the normal case is that we have 0, 1, or 2. When we have 0 or 1, we just add a new one. When we already have two, we delete the oldest one, and add a new one. We could change the if
statement in line 662 to a loop ... Not sure that's worth it, as the only way to get more than 2 savepoints in the beginning is by manually creating them or running code that creates more than two.
- Since this is in the commit function, it will only run on a full commit of an update run and could potentially contain more than 1 block in the commit. In that case, the %5 check could potentially skip over a savepoint, but this would only realistically happen on initial indexing, or starting up after being down for a few blocks, or after recovering from a reorg.
Not sure I understand. In the initial indexing, commit is called every 5k blocks, so always on blocks divisible by 5000, which are also divisible by 5. In all other cases, commit() is called whenever the indexer reaches the tip, so it will eventually hit a block that meets the condition. I am not concerned about going back a few blocks more than absolutely required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If first checks if there are 2 or more checkpoints, and if so, deletes the oldest one. That means if somehow we ever get to more than 2 checkpoints, we would never reduce them down to 2. But the normal case is that we have 0, 1, or 2. When we have 0 or 1, we just add a new one. When we already have two, we delete the oldest one, and add a new one. We could change the if statement in line 662 to a loop ... Not sure that's worth it, as the only way to get more than 2 savepoints in the beginning is by manually creating them or running code that creates more than two.
Sorry, I didn't describe the issue properly. It's not really an issue, just that in the normal case there will be 3 savepoints instead of 2. When it runs, if there are 3 checkpoints, it'll delete one bringing it down to 2, and then it'll create a new one. That results in there always being 3 check points in the normal case. If, instead, we first create a checkpoint and then delete the oldest one, we would have 2 in the normal case. Not an issue, just bringing it up in case you want to have 2 points in the normal case.
Not sure I understand. In the initial indexing, commit is called every 5k blocks, so always on blocks divisible by 5000, which are also divisible by 5. In all other cases, commit() is called whenever the indexer reaches the tip, so it will eventually hit a block that meets the condition. I am not concerned about going back a few blocks more than absolutely required.
The user can do a ctrl+c on block 3 in the initial indexing (for arguments sake) and then the commits will happen on blocks 5003 + 5000n. Also, in the case of them doing a ctrl+c on block 802316 and then starting up again on block 802321, then a savepoint wouldn't be created. In the worst case for the web server, if a reorg happens and we jump back 5 blocks, then we end up reindexing those 5 blocks and whatever is new, we could potentially jump over a savepoint before we start doing the single block index. I know that the chances of 2 reorgs happening in such a short timespan are low, but it is possible.
The other argument for this is people using Ord not in web-server mode with continuous indexing, instead, updating their index on demand when they execute a command. If they happen to never execute a command on a block divisible by 5, they would never get a savepoint.
The solution for this is to take the savepoint logic into the update loop and execute the savepoint creation there, or to keep the logic here, but also add a condition in that loop to commit on blocks divisible by 5 that are less than 10 (maybe a bit more) blocks away from the head block.
I've consolidated most of the logic into one file. I've added back the |
|
||
let mut wtx = index.begin_write()?; | ||
|
||
let oldest_savepoint = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this, an alternative to consider is to rollback to the latest savepoint and delete it. That would only roll you back 1-4 blocks instead of 10-14. If it's not far enough, then when the updater runs again it will hit a reorg again and rollback to the older savepoint automatically.
@@ -130,7 +117,7 @@ impl Updater { | |||
self.commit(wtx, value_cache)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there should be a commit test as part of this if block. Something like:
let shouldCommitForSavepoint = starting_height - self.height < 15 && (self.height + uncommitted) % 5 === 0;
if uncommitted == 5000 || shouldCommitForSavepoint {
self.commit(wtx, value_cache)?;
Disclaimer: Terrible variable naming and the logic might be off by 1 😋
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe a helper function in the reorg file
Reorg in its own file is great 👍 Nice that it's contained in one, central place |
I think more RAM usage represents an acceptable trade-off for increased resilience. Would be great to see what you are doing exactly on your branch. I'm just going to merge this for now but if you find an improvement definitely open a PR! |
If a reorg is detected it rolls back the database 6 blocks and reindexes from there.
I had to disable some test optimisations so no the tests run quite a bit longer. Will try to find a way to get those back.