Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds --forceupdate to teosd #216

Merged
merged 1 commit into from
May 9, 2023
Merged

Conversation

sr-gi
Copy link
Member

@sr-gi sr-gi commented May 4, 2023

If a tower hasn't been running for a long time and the backend runs in pruned mode it could be the case that by the time the tower comes back online, the Last known block by the tower is not being known by the node anymore. In this situation, the tower cannot bootstrap normally, given the cache cannot be populated.

This commits adds a new argument to teosd (--forceupdate) that can be used to force a tower to update its last known block to the earliest known block by the backend under this situation. Notice that doing so may make the tower miss some of its state transitions (the ones triggered by missed blocks), so this must be done as a last resource.

Fixes #209
Supersedes #212

Testing

This can be tested easily in regtest by running bitcoind with the (hidden, undocumented) -fastprune modifier. That would make blocks smaller and easier to prune. After that, you can prune the chain using bitcoin-cli pruneblockchain x where x is your target prune height. The pruning is not likely to happen to the exact block you specified, given block files are pruned as a whole, so the chain would be pruned to the closest value that allows a bock file to be deleted.

So the flow of testing should be:

  1. Start teosd and mine a few blocks so there is a last_known_block known by the tower
  2. Shut down the tower
  3. Mine a few thousand more blocks
  4. Call bitcoin-cli pruneblockchain x and checked the returned y value
  5. If y - IRREVOCABLY_RESOLVED < last_known_block_height, continue, otherwise go back to 3.
  6. Run teosd again and test it with and without --forceupdate

@sr-gi sr-gi added the Seeking Code Review review me pls label May 4, 2023
@sr-gi
Copy link
Member Author

sr-gi commented May 4, 2023

I've tested this manually and it seems to run just fine.

In order to test this you'd need a tower that has a last known block at height n and a node with a chain that has pruned that same block. I haven't been able to test this on regtest given pruning is based on how much block data we are holding (not how many blocks) and regtest blocks are mainly empty, so the threshold is never met. However, I've been able to test this using mainnet and an old tower state.

I may add some E2E testing for this if I manage to create a test where full blocks are created, so eviction can be triggered.

@sr-gi sr-gi force-pushed the outdated_tower branch from a338453 to 5d78992 Compare May 4, 2023 04:22
Copy link
Collaborator

@mariocynicys mariocynicys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused about what the chain_monitor would do given it's being passed an old/pruned tip. Is poll_best_tip() gonna just skip the pruned blocks between tip and the best tip?

teos/src/main.rs Outdated
Comment on lines 190 to 191
.validate(block_hash)
.unwrap()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ummm, doesn't this fail as well when we are out of blocks?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it does not because the headers chain is preserved even if the blocks are pruned.

teos/src/main.rs Outdated
Comment on lines 238 to 244
bitcoin_cli
.deref()
.get_header(&hash, Some(height as u32))
.await
.unwrap()
.validate(hash)
.unwrap(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we move this into a variable and call it tip.

teos/src/main.rs Outdated
.await
{
Ok(blocks) => blocks,
Err(_) => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we go into this branch we would return some last_n_blocks whose tip isn't the same as the tip variable we are using in the initialization of the tower components (watcher, etc...).

teos/src/main.rs Outdated
let hash = rpc
.get_block_hash(height + IRREVOCABLY_RESOLVED as u64)
.unwrap();
if let Ok(blocks) = get_last_n_blocks(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These n blocks won't be delivered to the tower components, right?

@sr-gi
Copy link
Member Author

sr-gi commented May 4, 2023

I'm confused about what the chain_monitor would do given it's being passed an old/pruned tip. Is poll_best_tip() gonna just skip the pruned blocks between tip and the best tip?

Yeah, you're right. I've come up with a better solution than this, will force push asap

@sr-gi sr-gi force-pushed the outdated_tower branch from 5d78992 to 9a0e914 Compare May 5, 2023 03:23
@sr-gi
Copy link
Member Author

sr-gi commented May 5, 2023

This should make more sense now

teos/src/main.rs Outdated
// If we are running in pruned mode some data may be missing (if we happen to have been offline for a wile)
if let Some(prune_height) = rpc.get_blockchain_info().unwrap().prune_height {
if last_known_header.height < prune_height as u32 {
log::debug!(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be logged as a warning.

teos/src/main.rs Outdated

// If we are running in pruned mode some data may be missing (if we happen to have been offline for a wile)
if let Some(prune_height) = rpc.get_blockchain_info().unwrap().prune_height {
if last_known_header.height < prune_height as u32 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would still fail if some of the 100 blocks that proceeds the last known one are pruned.

The check should be: last_known_header.height - 100 < prune_height as u32

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<=, but you're right

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, NVM, I was thinking about the positive case (>=)

@sr-gi sr-gi force-pushed the outdated_tower branch from 9a0e914 to b104d94 Compare May 5, 2023 13:30
@sr-gi
Copy link
Member Author

sr-gi commented May 5, 2023

I fixed suggestions and updated some of the logs since I didn't like them much.

I've also updated the PR description with guidelines for how to test this in regtest.

Copy link
Collaborator

@mariocynicys mariocynicys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nits, looks good otherwise.
Didn't manually test it yet though.

teos/src/main.rs Outdated Show resolved Hide resolved
teos/src/main.rs Outdated Show resolved Hide resolved
teos/src/main.rs Outdated Show resolved Hide resolved
@sr-gi sr-gi force-pushed the outdated_tower branch from b104d94 to d52fdcb Compare May 6, 2023 14:13
@sr-gi
Copy link
Member Author

sr-gi commented May 6, 2023

Nits should be fixed. I'd be nice to get a tACK

@mariocynicys
Copy link
Collaborator

mariocynicys commented May 6, 2023

Nits should be fixed. I'd be nice to get a tACK

Yeah will work on that tonight.

Copy link
Collaborator

@mariocynicys mariocynicys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tACK

Some comments (again :p) inline, looks good otherwise.

@@ -102,6 +102,11 @@ pub struct Opt {
#[structopt(long)]
pub tor_support: bool,

/// Forces the tower to run even if the underlying chain has gone too far out of sync. This can only happen
// if the node is being run in pruned mode.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second line being a non-doc comment makes structopt ignore it in the help message.
image

teos/src/main.rs Outdated
if let Some(prune_height) = rpc.get_blockchain_info().unwrap().prune_height {
if last_known_header.height - IRREVOCABLY_RESOLVED < prune_height as u32 {
log::warn!(
"Cannot load blocks in the range {}-{}. Chain has gone to far out of sync",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: "totoo far our of sync"

teos/src/main.rs Outdated

// If we are running in pruned mode some data may be missing (if we happen to have been offline for a while)
if let Some(prune_height) = rpc.get_blockchain_info().unwrap().prune_height {
if last_known_header.height - IRREVOCABLY_RESOLVED < prune_height as u32 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has an off-by-one error 😂, should be last_known_header.height - IRREVOCABLY_RESOLVED + 1 < prune_height as u32.

Let's say we last witnessed block 3000, then we will want to load blocks 2901-3000 (100 blocks). prune_height indicates the lowest block we have stored, so if prune_height is 2901 we should be fine, but last_known_header.height - IRREVOCABLY_RESOLVED will be 2900, which would make the check fail.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right

teos/src/main.rs Outdated
if last_known_header.height - IRREVOCABLY_RESOLVED < prune_height as u32 {
log::warn!(
"Cannot load blocks in the range {}-{}. Chain has gone to far out of sync",
last_known_header.height - IRREVOCABLY_RESOLVED,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be last_known_header.height - IRREVOCABLY_RESOLVED + 1.

@sr-gi
Copy link
Member Author

sr-gi commented May 7, 2023

Addressed comments

Copy link
Collaborator

@mariocynicys mariocynicys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tACK c7f9f0a

@sr-gi sr-gi removed the Seeking Code Review review me pls label May 9, 2023
If a tower hasn't been running for a long time and the backend runs in pruned
mode it could be the case that by the time the tower comes back online, the Last
known block by the tower is not being known by the node anymore. In this situation,
the tower cannot bootstrap normally, given the cache cannot be populated.

This commits adds a new argument to teosd (`--forceupdate`) that can be used to
force a tower to update its last known block to the earliest known block by the backend
under this situation. Notice that doing so may make the tower miss some of its state
transitions (the ones triggered by missed blocks), so this must be done as a last resource.
@sr-gi sr-gi force-pushed the outdated_tower branch from c7f9f0a to 2bf555c Compare May 9, 2023 13:49
@sr-gi sr-gi merged commit 1a89c5d into talaia-labs:master May 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

teosd may not be able to bootstrap in pruned mode if the chain is far ahead
2 participants