Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix wallet sync for RpcBlockchain #683

Merged
merged 3 commits into from
Aug 4, 2022

Conversation

evanlinjin
Copy link
Member

@evanlinjin evanlinjin commented Jul 25, 2022

Fixes #677

Description

Unfortunately to fix all the problems, I had to do a complete re-implementation of RpcBlockchain.

The new implementation fixes the following:

  • We can track more than 100 scriptPubKeys
  • We can obtain more than 1000 transactions per sync
  • Transaction "metadata" for already-syned transactions are updated when we introduce new scriptPubKeys

RpcConfig changes:

  • Introduce RpcSyncParams.
  • Remove RpcConfig::skip_blocks (this is replaced by RpcSyncParams::start_time).

Notes to the reviewers

  • The RpcConfig structure is changed. It will be good to confirm whether this is an okay change.

Checklists

All Submissions:

  • I've signed all my commits
  • I followed the contribution guidelines
  • I ran cargo fmt and cargo clippy before committing

New Features:

* [ ] I've added tests for the new feature

  • I've added docs for the new feature
  • I've updated CHANGELOG.md

Bugfixes:

  • This pull request breaks the existing API
  • I've added tests to reproduce the issue which are now passing
  • I'm linking the issue being fixed by this PR

@evanlinjin evanlinjin force-pushed the fix-wallet-sync-rpc branch 3 times, most recently from 1116b58 to d984ac1 Compare July 28, 2022 14:56
@evanlinjin evanlinjin changed the title WIP: Fix wallet sync for RpcBlockchain Fix wallet sync for RpcBlockchain Jul 28, 2022
@evanlinjin evanlinjin force-pushed the fix-wallet-sync-rpc branch 2 times, most recently from c7fbd0d to 1205e6a Compare July 28, 2022 15:49
@evanlinjin evanlinjin marked this pull request as ready for review July 28, 2022 15:49
@evanlinjin

This comment was marked as resolved.

@danielabrozzoni danielabrozzoni added the bug Something isn't working label Jul 28, 2022
@evanlinjin evanlinjin force-pushed the fix-wallet-sync-rpc branch 6 times, most recently from 0ed7e52 to 45caf45 Compare July 30, 2022 13:34
@evanlinjin
Copy link
Member Author

We should probably check whether this fixes #598

src/blockchain/rpc.rs Outdated Show resolved Hide resolved
Copy link
Member

@danielabrozzoni danielabrozzoni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This review was a collaborative effort between me and @afilini, but I want to make it clear that if some comments are stupid, it's only his fault. (jk) (well, maybe not)

Concept ACK, some comments

let is_derivable = db_spk_count > 2;

// ensure db scripts meet start script count requirements
if is_derivable && db_spks.len() < self.sync_params.start_script_count {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use db_spk_count

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, maybe you should check those per KeychainKind: start_script_count = 100 imho should mean 100 cached for the external and 100 cached for the internal descriptor. Otherwise you risk having start_script_count = 100 and, instead of 50/50, having 100 cached for the external and 0 for the internal.

If that's the case, maybe we should rename count -> index? So you can say "index = 50" instead of count = 100 and it's clearer

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Changed.

        let ext_spks = db.iter_script_pubkeys(Some(KeychainKind::External))?;
        let int_spks = db.iter_script_pubkeys(Some(KeychainKind::Internal))?;

        // This is a hack to see whether atleast one of the keychains comes from a derivable
        // descriptor. We assume that non-derivable descriptors always has a script count of 1.
        let last_count = std::cmp::max(ext_spks.len(), int_spks.len());
        let has_derivable = last_count > 1;

        // If atleast one descriptor is derivable, we need to ensure scriptPubKeys are sufficiently
        // cached.
        if has_derivable && last_count < params.start_script_count {
            let inner_err = MissingCachedScripts {
                last_count,
                missing_count: params.start_script_count - last_count,
            };
            debug!("requesting more spks with: {:?}", inner_err);
            return Err(Error::MissingCachedScripts(inner_err));
        }

Comment on lines 359 to 360
.map(|v| v.map(|i| (*keychain, i)))
.transpose()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe here it's better for readability if you use match?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like it's a matter of taste, but I've change it :/

            .filter_map(|keychain| match db.get_last_index(*keychain) {
                Ok(li_opt) => li_opt.map(|li| Ok((*keychain, li))),
                Err(err) => Some(Err(err)),
            })

Comment on lines 400 to 413
let raw_tx =
match &db_tx.transaction {
Some(raw_tx) => raw_tx,
None => {
updated = true;
db_tx.transaction.insert(client.get_raw_transaction(
&tx_res.info.txid,
tx_res.info.blockhash.as_ref(),
)?)
}
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit unclear that in the None case, you're inserting in the same thing you're matching on (took me a second). What about:

                match &mut db_tx.transaction {
                    Some(raw_tx) => raw_tx,
                    tx_opt => {
                        updated = true;
                        tx_opt.insert(client.get_raw_transaction(
                            &tx_res.info.txid,
                            tx_res.info.blockhash.as_ref(),
                        )?)
                    }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea. Changed.

Comment on lines 473 to 488
self._sent_from_raw_tx(db, db_tx.transaction.as_ref()?)
.map(|sent| {
if db_tx.sent != sent {
Some((*txid, sent))
} else {
None
}
})
.transpose()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, a match here might be more readable

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed.

})
.transpose()
})
.collect::<Result<Vec<_>, _>>()?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might not need to collect into a Vec, as you're iterating them just below

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OTOH... Why do you need two loops? Can't you record the updates directly when you found them?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's because we cannot have two mutable references (self is already a mutable reference).

The solution I changed to is to clone self.txs first. However, I am thinking now DbState::update_state should really returns DbStateUpdates instead of having self mutable? But I am already spending too much time on this PR!!! haha

keychain.as_byte(),
index
);
self.updated_last_indexes.insert(keychain);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if it's worth it to have both a last_indexes and a updated_last_indexes, as it's quite complex and not a great performance improvement. You could simply have last_indexes and at the end always rewrite the last index of each keychain in the database - without really caring if you didn't modify it. It's just two rows to update, not thousands, and I feel like it would make the code significantly easier to read.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed!

self.txs
.keys()
.filter(|&txid| !self.retained_txs.contains(txid))
.try_for_each(|txid| batch.del_tx(txid, false).map(|_| del_txs += 1))?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this will work, as usually batch.del_tx will return Some only if you're deleting something that has just been inserted in the batch.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it? You're still doing:

if batch.del_tx(txid, false)?.is_some() {
    debug!("deleting tx: {}", txid);
    del_txs += 1;
}

But my point is that del_tx will delete a transaction even when returning None...

Ping @afilini for an explanation, I don't remember why is the BatchDatabase doing so

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's because the batch object is basically disconnected from the actual database, it's just a collection of operations that are then applied at once at the end

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@afilini @danielabrozzoni Thanks for pointing this out! I'll actually fix it now

info!(
"db batch updates: del_txs={}, update_txs={}, update_utxos={}",
del_txs,
self.updated_txs.len(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this count correct? Above, you filter_map the updated_txs before inserting them

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only filter the deleted txs, in which case I have the del_txs variable :)

// update txs
self.updated_txs
.iter()
.filter_map(|txid| self.txs.get(txid))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe if self.txs.get(txid) never returns None it's better to expect with an explanation?

....Unless there are some cases where None might be returned! But I can't think of any

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorted.

        // update txs
        self.updated_txs
            .iter()
            .inspect(|&txid| debug!("updating tx: {}", txid))
            .try_for_each(|txid| batch.set_tx(self.txs.get(txid).unwrap()))?;

src/blockchain/rpc.rs Outdated Show resolved Hide resolved
Copy link
Member

@afilini afilini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just have a few very small comment, code looks very good now!

/// Calculates received amount from raw tx.
fn received_from_raw_tx(db: &D, raw_tx: &Transaction) -> Result<u64, Error> {
raw_tx.output.iter().try_fold(0_u64, |recv, txo| {
let v = if db.is_mine(&txo.script_pubkey)? {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you can use self.db and remove the argument

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at how received_from_raw_tx is used, it is used while a mutable reference to a field of self still exists.

The suggested change will result in:

error[E0502]: cannot borrow `*self` as immutable because it is also borrowed as mutable

let last_count = std::cmp::max(ext_spks.len(), int_spks.len());
let has_derivable = last_count > 1;

// If atleast one descriptor is derivable, we need to ensure scriptPubKeys are sufficiently
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: at least

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

English is such an annoying language. Rules do not make sense. Anyway, thank you for pointing this out! haha


// update tx sent fields from tx inputs
self.txs
.clone() // clone is required as we cannot have more than two mutable references
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: more than one (or alternatively just "two")

}

// update tx sent fields from tx inputs
self.txs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a couple of suggestions here to improve this a bit:

  1. instead of cloning the whole thing you can call values() first, so at least you avoid cloning the keys
  2. the check that looks for the tx in retained_txs could be moved into a filter() method, which maybe makes this a bit more readable. Also this check could be moved before the clone() to again avoid cloning things you don't need

Copy link
Member Author

@evanlinjin evanlinjin Aug 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@afilini Thank you for the suggestions!

However, using .values().cloned() would not work, as calling .cloned() only clones one element at a time (and we have a mutable reference of the whole self).

In my original commit, I had all filters in an .filter_map and had two loop (one for marking changes and one for applying changes) so we avoid the mutable-reference problems.

I think it is best to kinda go back to my original solution, but improve commenting and split "filter logic" into multiple .filter..() calls.

This is what I have now:

        // obtain vector of `TransactionDetails::sent` changes
        let sent_updates = self
            .txs
            .values()
            // only bother to update txs that are retained
            .filter(|db_tx| self.retained_txs.contains(&db_tx.txid))
            // only bother to update txs where the raw tx is accessable
            .filter_map(|db_tx| (db_tx.transaction.as_ref().map(|tx| (tx, db_tx.sent))))
            // recalcuate sent value, only update txs in which sent value is changed
            .filter_map(|(raw_tx, old_sent)| {
                self.sent_from_raw_tx(raw_tx)
                    .map(|sent| {
                        if sent != old_sent {
                            Some((raw_tx.txid(), sent))
                        } else {
                            None
                        }
                    })
                    .transpose()
            })
            .collect::<Result<Vec<_>, _>>()?;

        // record send updates
        sent_updates.iter().for_each(|&(txid, sent)| {
            // apply sent field changes
            self.txs.entry(txid).and_modify(|db_tx| db_tx.sent = sent);
            // mark tx as modified
            self.updated_txs.insert(txid);
        });

This was my original:

bdk/src/blockchain/rpc.rs

Lines 454 to 476 in 9d787bf

// update sent from tx inputs
let sent_updates = self
.txs
.values()
.filter_map(|db_tx| {
let txid = self.retained_txs.get(&db_tx.txid)?;
self._sent_from_raw_tx(db, db_tx.transaction.as_ref()?)
.map(|sent| {
if db_tx.sent != sent {
Some((*txid, sent))
} else {
None
}
})
.transpose()
})
.collect::<Result<Vec<_>, _>>()?;
// record send updates
sent_updates.into_iter().for_each(|(txid, sent)| {
self.txs.entry(txid).and_modify(|db_tx| db_tx.sent = sent);
self.updated_txs.insert(txid);
});

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just had a thought. If we can guarantee that txs returned from listtransactions RPC call are in chronological order (which I assume it would be), we can have the TransactionDetails::sent recalculations in the same loop as the recalculations of all the other fields.

The only benefit of this is for a slight performance gain, so maybe no point to bother? (I would assume that reducing IO would contribute the most to performance)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I remember this now, I think I was the one who suggested merging the two loops 😅

In the end I like what you came up with!

@afilini
Copy link
Member

afilini commented Aug 3, 2022

Also if you could rebase to fix the conflicts in changelog.md

evanlinjin and others added 2 commits August 4, 2022 11:27
The new implementation fixes the following:
* We can track more than 100 scriptPubKeys
* We can obtain more than 1000 transactions per sync
* `TransactionDetails` for already-synced transactions are updated when
  new scriptPubKeys are introduced (fixing the missing balance/coins
      issue of supposedly tracked scriptPubKeys)

`RpcConfig` changes:
* Introduce `RpcSyncParams`.
* Remove `RpcConfig::skip_blocks` (this is replaced by
  `RpcSyncParams::start_time`).
Before this commit, the rpc backend would not notice immature utxos
(`listunspent` does not return them), making the rpc balance different
to other blockchain implementations.

Co-authored-by: Daniela Brozzoni <danielabrozzoni@protonmail.com>
These are as suggested by @danielabrozzoni and @afilini

Also introduced `RpcSyncParams::force_start_time` for users who
prioritise reliability above all else.

Also improved logging.
Copy link
Member

@afilini afilini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 5eeba6c

@afilini afilini merged commit dc7adb7 into bitcoindevkit:master Aug 4, 2022
@evanlinjin evanlinjin mentioned this pull request Aug 6, 2022
5 tasks
afilini added a commit that referenced this pull request Aug 9, 2022
74e2c47 Replace `rpc::CoreTxIter` with `list_transactions` fn. (志宇)

Pull request description:

  ### Description

  This fixes a bug where `CoreTxIter` attempts to call `listtransactions` immediately after a tx result is filtered (instead of being returned), when in fact, the correct logic will be to pop another tx result.

  The new logic also ensures that tx results are returned in chonological order. The test `test_list_transactions` verifies this. We also now ensure that `page_size` is between the range `[0 to 1000]` otherwise an error is returned.

  Some needless cloning is removed from `from_config` as well as logging improvements.

  ### Notes to the reviewers

  This is an oversight by me (sorry) for PR #683

  ### Checklists

  #### All Submissions:

  * [x] I've signed all my commits
  * [x] I followed the [contribution guidelines](https://github.com/bitcoindevkit/bdk/blob/master/CONTRIBUTING.md)
  * [x] I ran `cargo fmt` and `cargo clippy` before committing

  #### Bugfixes:

  ~* [ ] This pull request breaks the existing API~
  * [x] I've added tests to reproduce the issue which are now passing
  * [x] I'm linking the issue being fixed by this PR

ACKs for top commit:
  afilini:
    ACK 74e2c47

Tree-SHA512: f32314a9947067673d19d95da8cde36b350c0bb0ebe0924405ad50602c14590f7ccb09a3e03cdfdd227f938dccd0f556f3a2b4dd7fdd6eba1591c0f8d3e65182
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

WalletSync issues with RpcBlockchain and CompactFiltersBlockchain
3 participants