Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create benchmark for tracking indexing performance #94

Closed
casey opened this issue Feb 1, 2022 · 5 comments · Fixed by #111
Closed

Create benchmark for tracking indexing performance #94

casey opened this issue Feb 1, 2022 · 5 comments · Fixed by #111

Comments

@casey
Copy link
Collaborator

casey commented Feb 1, 2022

This issue is for tracking performance related to initial indexing of the blockchain.

@cberner What would be a useful benchmark for tracking index speed? I improved the integration test block and transaction generation code, so I could pretty easily create any amount of blocks and transactions, with arbitrary relationships, spread among any number of blockfiles.

The only thing relevant to indexing that's changed since we last worked on the code together is that I now store the full blocks in the redb database, using a new hash-to-block table. I think that there are no real guarantees about the contents of bitcoind's blocks directory, so I want to move away from using it as much as possible, except for perhaps initial indexing.

@cberner
Copy link
Contributor

cberner commented Feb 2, 2022

I think the easiest would be to track bytes of blockfiles ingested per second. That seems like a good metric, unless there are going to be blocks that are small but really expensive to process

@casey
Copy link
Collaborator Author

casey commented Feb 2, 2022

I think the easiest would be to track bytes of blockfiles ingested per second.

Sounds good.

That seems like a good metric, unless there are going to be blocks that are small but really expensive to process

Transactions with many inputs and outputs will probably be more expensive to process on a per-byte basis, but maybe not enough so to be worth thinking hard about.

I'll probably just generate a few gigs of "average" blocks, whatever that means.

By the way, I'm using APFS, which is a copy-on-write filesystem. Will that have any weird performance consequences for redb?

@cberner
Copy link
Contributor

cberner commented Feb 2, 2022

Hmm, do you know what granularity it's copy-on-write at? I'd guess no, if things like postgres or sqlite work on APFS. If the granularity is really course, then you'd want to maximize the size of your write transactions.

@casey
Copy link
Collaborator Author

casey commented Feb 2, 2022

Would it be the same as the block size? If so, my APFS filesystem which was initialized with default values is 4096 bytes/block.

@cberner
Copy link
Contributor

cberner commented Feb 2, 2022

It sounds like yes? If so, then that should be totally fine. 4096 is the same as the page size, and redb is copy-on-write at the page level (I removed those complicated delta writes)

@casey casey linked a pull request Feb 13, 2022 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants