Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support dumping to and restoring from dumpfiles #411

Closed
casey opened this issue Oct 24, 2022 · 7 comments
Closed

Support dumping to and restoring from dumpfiles #411

casey opened this issue Oct 24, 2022 · 7 comments

Comments

@casey
Copy link
Contributor

casey commented Oct 24, 2022

We're working on speeding up ord indexing, and one issue we've run into is that it's very hard to build a synthetic benchmark which is representative of ord's main workload.

An example, which is also quite interesting on its own, is that UTXOs are often destroyed very soon after they're created. @virtuallythere has been working on speeding up ord's initial sync time, and he made this table, which shows the block range in which UTXOs are created and destroyed:

Screenshot_2022-10-23_at_22 45 21

So 20% of UTXOs are spent in the same block in which they're created, 30% are spent within two blocks of being created, and 60% are spent within 18 blocks of being created.

So any synthetic benchmark we wrote would have to reproduce these kinds of patterns, which would be quite hard.

The alternative to a synthetic benchmark would be to operate on real data, but then we have the challenge that we don't get to a fully representative workload until quite far into the sync process, due to variation in the patterns of on-chain transactions during different parts of Bitcoin's history, in particular, the long run of mostly empty blocks after its creation.

One apporach is to sync the index up until a representative height, and then benchmark blocks from there. A few issues with that approach though:

  • The database file is larger than the actual data it contains, due to fragmentation and other overhead, which would make sharing database files slow
  • The database file contains tables other than the main tables that we care about (OUTPOINT_TO_ORDINAL_RANGES and HEIGHT_TO_BLOCK_HASH), which increase the size of the database
  • The database file is sensitive to scheme changes, for example, changing the type of tables other than OUTPOINT_TO_ORDINAL_RANGES, which means we might need to create new snapshots for compatibility, even when the table we care about hasn't changed

As a separate concern, we'd eventually like to be able to distribute index snapshots that users can download, so they can start syncing their index from a recent, known-good state, and not have to sync from the beginning of time. Using redb database files for this purpose, in addition to the above issues, would introduce some other issues:

  • index.redb files don't have a deterministic hash, so multiple developers wouldn't be able to independently build and sign snapshots
  • index.redb files are much more complicated internally, and interface directly with much of the code of redb so it seems possible that a maliciously constructed index.redb might be an attack vector

To deal with all of the above, we came up with the idea of writing code to dump the main index state to a dump file, and then being able to load it from that file. In comparison to the actual index.redb, the dumpfile would:

  • Be deterministic, to facilitate verification by multiple developers
  • Exclude extraneous tables
  • Be as compact as possible, with minimal fragmentation and encoding overhead
  • Be very simple, to minimize attack surface
  • Compress well
  • As far as possible, not depend on the details of redb, database, or on the schema of extraneous tables, to minimize the need to rebuild snapshots when redb changes, or when we add or remove extraneous tables, or change their schema

This might be useful for redb to provide, since it seems like it would be generally useful. In addition to our purposes, it could be used for:

  • Compact database backups
  • Migrating between different versions of redb, without having to include explicit migration code in redb aside from dump and load from dump. (For in-process migration, you could dump and load from memory, or dump into an io::Write, with the new database loading from an io::Read, so there's no tempfile.)

A sketch of a possible design for such a dump file:

  • Contains a sequence of table dumps in ascending order by name, each of which is:
    • length-prefixed table name as UTF-8
    • serialized number of items in the table
    • the serialized items, each of which are:
      • the serialized key
      • the serialized value

And the API, with a few variations, being something like:

fn dump(&self, tables: &[TableDefinition], w: impl io::Write) -> io::Result<()> { … }

fn transfer(&self, tables: &[TableDefinition], db: &mut Database) -> redb::Result { … }

fn serialize(&self, tables: &[TableDefinition]) -> Vec<u8> { … }

Version which dump all tables would also be useful, although for our purposes excluding tables is useful.

We're actively working on benchmarking, so we'll probably start working on shitty, special purpose dump functionality that only dumps the tables that we care about, but having it be directly supported by redb would be awesome, and probably useful to other people as well.

@cberner
Copy link
Owner

cberner commented Oct 26, 2022

Oh very interesting about the UTXO spend rate. That feels like there should be a bunch of optimization opportunities.

The database file is larger than the actual data it contains, due to fragmentation and other overhead, which would make sharing database files slow

I've been planning to implement compaction (#20) which would help with some of the overhead, but for a benchmark I think the fragmentation and overhead is a feature not a problem. You want it to be as representative as possible. Same for the non-main tables. If you want to prune those though, you could delete_table() before saving the snapshot.

The database file is sensitive to scheme changes, for example, changing the type of tables other than OUTPOINT_TO_ORDINAL_RANGES, which means we might need to create new snapshots for compatibility, even when the table we care about hasn't changed

Table schemas are only validated when you call open_table(), so if you don't use the other table in your benchmark, then changing its schema shouldn't break anything -- it could make the benchmark less representative though.

Ya, I considered adding a dump/export method before (#100) but decided it wasn't a good idea. Your use case is a bit different than what I'd considered then, and I'll give it some more thought. But I think an export format has too large of a design space and is best left to users. For example, some people would probably want it to export in plaintext, and for a backup format you might want a differential based format that saved only the modified key-values.

@casey
Copy link
Contributor Author

casey commented Oct 26, 2022

Oh very interesting about the UTXO spend rate. That feels like there should be a bunch of optimization opportunities.

We just landed a very nice optimization by @virtuallythere, which exploits this by using a std::collections::HashMap as a cache that's flushed to the database as the last step before committing every 5000, and it yielded a huge speedup. Tons of transactions wind up never needing to actually touch the database. Benchmarking now on ordinals.com, but it's on the order of a 2x performance improvement, so pretty sweet.

Ya, I considered adding a dump/export method before (#100) but decided it wasn't a good idea. Your use case is a bit different than what I'd considered then, and I'll give it some more thought. But I think an export format has too large of a design space and is best left to users. For example, some people would probably want it to export in plaintext, and for a backup format you might want a differential based format that saved only the modified key-values.

This makes me think that maybe the thing to do is make it so that the redb format itself can serve as dump format that can let users quickly bootstrap an index from a recent database state.

Some thoughts:

  • How much wasted space will there be in a freshly compacted database?
  • If there is significant overhead, can some or all of it guaranteed to be runs of zeros, so that it will compress efficiently?
  • Can the compacted database be made deterministic, so that multiple developers can build it and get the same hash, so they can sign it? redb wouldn't have to make any guarantees about producing a byte-for-byte identical database file between different versions of redb, even point releases, just for the same release.
  • As you said, we can drop tables from the dump before saving it. We can do this manually by copying the file, dropping tables, and then compacting.
  • The additional attack surface/complexity of the dump file is not really that big a deal if multiple ord developers are all certifying dumps as okay, and you're running the ord software anyways.

@cberner
Copy link
Owner

cberner commented Oct 27, 2022

Nice, 2x is great!

How much wasted space will there be in a freshly compacted database?

About 25% wasted, so 1.3x larger than the raw btrees

If there is significant overhead, can some or all of it guaranteed to be runs of zeros, so that it will compress efficiently?

Free pages are already zeroed, although I might make this a feature flag since I think it has a small performance penalty

Can the compacted database be made deterministic, so that multiple developers can build it and get the same hash, so they can sign it?

It should already be deterministic, as long as you execute exactly the same operations in the same order (including any crashes and db repairs)

@casey
Copy link
Contributor Author

casey commented Oct 27, 2022

About 25% wasted, so 1.3x larger than the raw btrees

25% doesn't seem bad at all.

Free pages are already zeroed, although I might make this a feature flag since I think it has a small performance penalty

I might consider defaulting this to on. If the performance penalty is small, then I'd rather avoid having to think about the implications about the database potentially containing old data.

It should already be deterministic, as long as you execute exactly the same operations in the same order (including any crashes and db repairs)

Nice. In that case, I think for our purposes we can definitely just use the database itself as the dump format.

The process will be:

  • Create a new database
  • Copy data into the new database
  • Commit
  • Compact
  • Zip the compacted database

Do we need to the compact step if all we've done is insert data into a new database using a single transaction?

@casey casey closed this as completed Oct 27, 2022
@cberner
Copy link
Owner

cberner commented Oct 27, 2022

Ya, I've left it on for exactly that reason.

Nope, the only reason to compact is if you've done a lot of deletions. If you're just inserting data it shouldn't cause fragmentation

@veryordinally
Copy link

We are doing a ton of deletions during the indexing process, so doing a compaction would likely be worth it.

@cberner
Copy link
Owner

cberner commented Oct 31, 2022

Are they mixed with insertions? They will only cause fragmentation if you have transactions at the end of your indexing process that on net delete a lot of entries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants