Support dumping to and restoring from dumpfiles #411

casey · 2022-10-24T20:04:47Z

We're working on speeding up ord indexing, and one issue we've run into is that it's very hard to build a synthetic benchmark which is representative of ord's main workload.

An example, which is also quite interesting on its own, is that UTXOs are often destroyed very soon after they're created. @virtuallythere has been working on speeding up ord's initial sync time, and he made this table, which shows the block range in which UTXOs are created and destroyed:

So 20% of UTXOs are spent in the same block in which they're created, 30% are spent within two blocks of being created, and 60% are spent within 18 blocks of being created.

So any synthetic benchmark we wrote would have to reproduce these kinds of patterns, which would be quite hard.

The alternative to a synthetic benchmark would be to operate on real data, but then we have the challenge that we don't get to a fully representative workload until quite far into the sync process, due to variation in the patterns of on-chain transactions during different parts of Bitcoin's history, in particular, the long run of mostly empty blocks after its creation.

One apporach is to sync the index up until a representative height, and then benchmark blocks from there. A few issues with that approach though:

The database file is larger than the actual data it contains, due to fragmentation and other overhead, which would make sharing database files slow
The database file contains tables other than the main tables that we care about (OUTPOINT_TO_ORDINAL_RANGES and HEIGHT_TO_BLOCK_HASH), which increase the size of the database
The database file is sensitive to scheme changes, for example, changing the type of tables other than OUTPOINT_TO_ORDINAL_RANGES, which means we might need to create new snapshots for compatibility, even when the table we care about hasn't changed

As a separate concern, we'd eventually like to be able to distribute index snapshots that users can download, so they can start syncing their index from a recent, known-good state, and not have to sync from the beginning of time. Using redb database files for this purpose, in addition to the above issues, would introduce some other issues:

index.redb files don't have a deterministic hash, so multiple developers wouldn't be able to independently build and sign snapshots
index.redb files are much more complicated internally, and interface directly with much of the code of redb so it seems possible that a maliciously constructed index.redb might be an attack vector

To deal with all of the above, we came up with the idea of writing code to dump the main index state to a dump file, and then being able to load it from that file. In comparison to the actual index.redb, the dumpfile would:

Be deterministic, to facilitate verification by multiple developers
Exclude extraneous tables
Be as compact as possible, with minimal fragmentation and encoding overhead
Be very simple, to minimize attack surface
Compress well
As far as possible, not depend on the details of redb, database, or on the schema of extraneous tables, to minimize the need to rebuild snapshots when redb changes, or when we add or remove extraneous tables, or change their schema

This might be useful for redb to provide, since it seems like it would be generally useful. In addition to our purposes, it could be used for:

Compact database backups
Migrating between different versions of redb, without having to include explicit migration code in redb aside from dump and load from dump. (For in-process migration, you could dump and load from memory, or dump into an io::Write, with the new database loading from an io::Read, so there's no tempfile.)

A sketch of a possible design for such a dump file:

Contains a sequence of table dumps in ascending order by name, each of which is:
- length-prefixed table name as UTF-8
- serialized number of items in the table
- the serialized items, each of which are:
  - the serialized key
  - the serialized value

And the API, with a few variations, being something like:

fn dump(&self, tables: &[TableDefinition], w: impl io::Write) -> io::Result<()> { … }

fn transfer(&self, tables: &[TableDefinition], db: &mut Database) -> redb::Result { … }

fn serialize(&self, tables: &[TableDefinition]) -> Vec<u8> { … }

Version which dump all tables would also be useful, although for our purposes excluding tables is useful.

We're actively working on benchmarking, so we'll probably start working on shitty, special purpose dump functionality that only dumps the tables that we care about, but having it be directly supported by redb would be awesome, and probably useful to other people as well.

The text was updated successfully, but these errors were encountered:

cberner · 2022-10-26T03:00:36Z

Oh very interesting about the UTXO spend rate. That feels like there should be a bunch of optimization opportunities.

The database file is larger than the actual data it contains, due to fragmentation and other overhead, which would make sharing database files slow

I've been planning to implement compaction (#20) which would help with some of the overhead, but for a benchmark I think the fragmentation and overhead is a feature not a problem. You want it to be as representative as possible. Same for the non-main tables. If you want to prune those though, you could delete_table() before saving the snapshot.

The database file is sensitive to scheme changes, for example, changing the type of tables other than OUTPOINT_TO_ORDINAL_RANGES, which means we might need to create new snapshots for compatibility, even when the table we care about hasn't changed

Table schemas are only validated when you call open_table(), so if you don't use the other table in your benchmark, then changing its schema shouldn't break anything -- it could make the benchmark less representative though.

Ya, I considered adding a dump/export method before (#100) but decided it wasn't a good idea. Your use case is a bit different than what I'd considered then, and I'll give it some more thought. But I think an export format has too large of a design space and is best left to users. For example, some people would probably want it to export in plaintext, and for a backup format you might want a differential based format that saved only the modified key-values.

casey · 2022-10-26T03:55:21Z

Oh very interesting about the UTXO spend rate. That feels like there should be a bunch of optimization opportunities.

We just landed a very nice optimization by @virtuallythere, which exploits this by using a std::collections::HashMap as a cache that's flushed to the database as the last step before committing every 5000, and it yielded a huge speedup. Tons of transactions wind up never needing to actually touch the database. Benchmarking now on ordinals.com, but it's on the order of a 2x performance improvement, so pretty sweet.

Ya, I considered adding a dump/export method before (#100) but decided it wasn't a good idea. Your use case is a bit different than what I'd considered then, and I'll give it some more thought. But I think an export format has too large of a design space and is best left to users. For example, some people would probably want it to export in plaintext, and for a backup format you might want a differential based format that saved only the modified key-values.

This makes me think that maybe the thing to do is make it so that the redb format itself can serve as dump format that can let users quickly bootstrap an index from a recent database state.

Some thoughts:

How much wasted space will there be in a freshly compacted database?
If there is significant overhead, can some or all of it guaranteed to be runs of zeros, so that it will compress efficiently?
Can the compacted database be made deterministic, so that multiple developers can build it and get the same hash, so they can sign it? redb wouldn't have to make any guarantees about producing a byte-for-byte identical database file between different versions of redb, even point releases, just for the same release.
As you said, we can drop tables from the dump before saving it. We can do this manually by copying the file, dropping tables, and then compacting.
The additional attack surface/complexity of the dump file is not really that big a deal if multiple ord developers are all certifying dumps as okay, and you're running the ord software anyways.

cberner · 2022-10-27T03:13:16Z

Nice, 2x is great!

How much wasted space will there be in a freshly compacted database?

About 25% wasted, so 1.3x larger than the raw btrees

If there is significant overhead, can some or all of it guaranteed to be runs of zeros, so that it will compress efficiently?

Free pages are already zeroed, although I might make this a feature flag since I think it has a small performance penalty

Can the compacted database be made deterministic, so that multiple developers can build it and get the same hash, so they can sign it?

It should already be deterministic, as long as you execute exactly the same operations in the same order (including any crashes and db repairs)

casey · 2022-10-27T03:36:06Z

About 25% wasted, so 1.3x larger than the raw btrees

25% doesn't seem bad at all.

Free pages are already zeroed, although I might make this a feature flag since I think it has a small performance penalty

I might consider defaulting this to on. If the performance penalty is small, then I'd rather avoid having to think about the implications about the database potentially containing old data.

It should already be deterministic, as long as you execute exactly the same operations in the same order (including any crashes and db repairs)

Nice. In that case, I think for our purposes we can definitely just use the database itself as the dump format.

The process will be:

Create a new database
Copy data into the new database
Commit
Compact
Zip the compacted database

Do we need to the compact step if all we've done is insert data into a new database using a single transaction?

cberner · 2022-10-27T03:45:22Z

Ya, I've left it on for exactly that reason.

Nope, the only reason to compact is if you've done a lot of deletions. If you're just inserting data it shouldn't cause fragmentation

veryordinally · 2022-10-31T11:07:30Z

We are doing a ton of deletions during the indexing process, so doing a compaction would likely be worth it.

cberner · 2022-10-31T14:06:53Z

Are they mixed with insertions? They will only cause fragmentation if you have transactions at the end of your indexing process that on net delete a lot of entries.

casey closed this as completed Oct 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support dumping to and restoring from dumpfiles #411

Support dumping to and restoring from dumpfiles #411

casey commented Oct 24, 2022

cberner commented Oct 26, 2022

casey commented Oct 26, 2022

cberner commented Oct 27, 2022

casey commented Oct 27, 2022

cberner commented Oct 27, 2022

veryordinally commented Oct 31, 2022

cberner commented Oct 31, 2022

Support dumping to and restoring from dumpfiles #411

Support dumping to and restoring from dumpfiles #411

Comments

casey commented Oct 24, 2022

cberner commented Oct 26, 2022

casey commented Oct 26, 2022

cberner commented Oct 27, 2022

casey commented Oct 27, 2022

cberner commented Oct 27, 2022

veryordinally commented Oct 31, 2022

cberner commented Oct 31, 2022