Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deterministic shard databases/files #2688

Open
MarkusTeufelberger opened this issue Sep 18, 2018 · 2 comments
Open

Deterministic shard databases/files #2688

MarkusTeufelberger opened this issue Sep 18, 2018 · 2 comments
Assignees
Labels
Feature Request Used to indicate requests to add new features Reviewed

Comments

@MarkusTeufelberger
Copy link
Collaborator

As discussed in #2625 I'd like to propose a few changes to how shard databases are currently being built with the goal to have them created deterministically.

These changes would be:

  1. Only allow NuDB for shard databases (at least the deterministic kind). RocksDB is too much of a moving target.
  2. Values that are inserted into the shard database must be deterministically compressed (this could include not compressing them at all...)
  3. salt, app_num, key_size, block_size, load_factor and the hashing algorithm must be fixed and not derived from properties of the current system (sane defaults would probably be an easily recognizable number + the current shard number as salt, 1 for the app_num, 32 bytes for the key size [as is currently standard anyways], 4 KiB for the block_size, 0.5 for the load_factor and xxhash for the hashing algorithm)
  4. Deterministic shard database files can only be written once the last (highest) ledger sequence of that shard has been closed and only if all nodes of the 16384 ledgers they contain are available
  5. To write such a database file, all nodes contained in the shard must be written single-threaded in sorted order (ascending or descending, whatever is easier to implement) to a NuDB file. I haven't checked yet if there are other caches or optimizations going on, but hopefully that should already be enough to guarantee a deterministic result. If there's a lot of RAM available, it might be possible to load the whole shard into memory and sort it there, otherwise it could mean hitting the node store twice (once for getting all the keys and after sorting them for the values again).

Once such a file has been generated (ideally also with a canonical name...), it then is possible to compare hashes with others, immediately help seeding torrents or IPFS swarms or use this knowledge to implement a trustless way of quickly getting a lot of history without hitting your peer's node stores or shard databases. If you want to get very fancy, it would even be possible to add IPFS or BTinfo hashes to the ledger itself, if validators are willing (and able) to download and verify their contents (e.g. by extending rippled into a historic-headers-only mode and then asserting that a shard database contains all nodes necessary to walk all SHAmaps in the specified range with an unbroken header chain up to the latest one).

An alternative to forcing determinism upon NuDB would be to not use a database file format at all and just specify a deterministic export/import format for shards. An example would be a CSV file containing sorted key-value pairs in simple hex or base64 encoding. If these are independently verifiable, it might be already enough. As far as I understand the shard import code though, it kinda expects being able to randomly query the import file like a database.

What do you guys think, which approach is better - getting NuDB to do something it wasn't really designed for or to already overhaul/refactor something in rippled that was just added on the latest release? I started some experiments with the "deterministic NuDB" approach, but I'd be more motivated to work on something presentable once I know which approach is preferred by you guys.

Tagging/pinging @miguelportilla and @nbougalis for sharding and overall design/feature roadmap expertise.

@nbougalis nbougalis added the Feature Request Used to indicate requests to add new features label Nov 1, 2018
@MarkusTeufelberger
Copy link
Collaborator Author

Alright, I finally pulled through and finished my re-implementation of NuDB fetching and ledger node decoding in Python and could run some tests.

An interesting observation so far:

I write uncompressed nodes to the data store for now (it is really easy to write a dat file that way).
The dat file for shard 2 (ledgers [32769:49152], 82757 objects in total) is 167056294 bytes long (~160 MiB)
Re-keyed with the nudb tool for now (haven't implemented that part yet) and turned into a tar.lz4 file via tar cvf - 2 | lz4 -9 - 2.tar.lz4 leads to a file that's about 78% the original size (Compressed 170055680 bytes into 132427450 bytes ==> 77.87%)
BUT:
Compared to the NuDB files from #2561 this is horribly inefficient: They contain exactly the same data, only in a slightly different storage format, leading to a 155628595 bytes dat file (~149 MiB), but these compress down to under 7MiB. This is also a much better approximation of the actually contained information in this file - these early ledgers are relatively small and there are less than hundred transactions in this shard.

Interestingly, zstd does not show these limitations (at least when using its own algorithm, it still creates too large lz4 files).

@MarkusTeufelberger
Copy link
Collaborator Author

Alright, I managed to create a tar.lz4 file with deterministic contents that gets accepted by rippled and added to the shard store!

🎉

Still to do: Create a tar archive deterministically (they contain all sorts of timestamps...) and then make sure that lz4 is also deterministic in operation...

There is a bigger downside though - upon import of shard tar.lz4 files, rippled just seems to unpack them, it does NOT recompress the contents upon import (implementing compression schemes other than the InnerNodes one does not sound very stable to me...) or process them in any way.

manojsdoshi pushed a commit to manojsdoshi/rippled that referenced this issue Jun 24, 2020
This commit, if merged, adds support to allow multiple indepedent nodes to
produce a binary identical shard for a given range of ledgers. The advantage
is that servers can use content-addressable storage, and can more efficiently
retrieve shards by downloading from multiple peers at once and then verifying
the integrity of a shard by cross-checking its checksum with the checksum
other servers report.
manojsdoshi pushed a commit to manojsdoshi/rippled that referenced this issue Jun 25, 2020
This commit, if merged, adds support to allow multiple indepedent nodes to
produce a binary identical shard for a given range of ledgers. The advantage
is that servers can use content-addressable storage, and can more efficiently
retrieve shards by downloading from multiple peers at once and then verifying
the integrity of a shard by cross-checking its checksum with the checksum
other servers report.
manojsdoshi pushed a commit to manojsdoshi/rippled that referenced this issue Jun 25, 2020
This commit, if merged, adds support to allow multiple indepedent nodes to
produce a binary identical shard for a given range of ledgers. The advantage
is that servers can use content-addressable storage, and can more efficiently
retrieve shards by downloading from multiple peers at once and then verifying
the integrity of a shard by cross-checking its checksum with the checksum
other servers report.
manojsdoshi pushed a commit to manojsdoshi/rippled that referenced this issue Jun 25, 2020
This commit, if merged, adds support to allow multiple indepedent nodes to
produce a binary identical shard for a given range of ledgers. The advantage
is that servers can use content-addressable storage, and can more efficiently
retrieve shards by downloading from multiple peers at once and then verifying
the integrity of a shard by cross-checking its checksum with the checksum
other servers report.
manojsdoshi pushed a commit to manojsdoshi/rippled that referenced this issue Jun 25, 2020
This commit, if merged, adds support to allow multiple indepedent nodes to
produce a binary identical shard for a given range of ledgers. The advantage
is that servers can use content-addressable storage, and can more efficiently
retrieve shards by downloading from multiple peers at once and then verifying
the integrity of a shard by cross-checking its checksum with the checksum
other servers report.
manojsdoshi pushed a commit to manojsdoshi/rippled that referenced this issue Jun 25, 2020
This commit, if merged, adds support to allow multiple indepedent nodes to
produce a binary identical shard for a given range of ledgers. The advantage
is that servers can use content-addressable storage, and can more efficiently
retrieve shards by downloading from multiple peers at once and then verifying
the integrity of a shard by cross-checking its checksum with the checksum
other servers report.
manojsdoshi pushed a commit to manojsdoshi/rippled that referenced this issue Jun 29, 2020
This commit, if merged, adds support to allow multiple indepedent nodes to
produce a binary identical shard for a given range of ledgers. The advantage
is that servers can use content-addressable storage, and can more efficiently
retrieve shards by downloading from multiple peers at once and then verifying
the integrity of a shard by cross-checking its checksum with the checksum
other servers report.
manojsdoshi pushed a commit to manojsdoshi/rippled that referenced this issue Jun 29, 2020
This commit, if merged, adds support to allow multiple indepedent nodes to
produce a binary identical shard for a given range of ledgers. The advantage
is that servers can use content-addressable storage, and can more efficiently
retrieve shards by downloading from multiple peers at once and then verifying
the integrity of a shard by cross-checking its checksum with the checksum
other servers report.
manojsdoshi pushed a commit to manojsdoshi/rippled that referenced this issue Jun 30, 2020
This commit, if merged, adds support to allow multiple indepedent nodes to
produce a binary identical shard for a given range of ledgers. The advantage
is that servers can use content-addressable storage, and can more efficiently
retrieve shards by downloading from multiple peers at once and then verifying
the integrity of a shard by cross-checking its checksum with the checksum
other servers report.
manojsdoshi added a commit to manojsdoshi/rippled that referenced this issue Jul 22, 2020
manojsdoshi added a commit to manojsdoshi/rippled that referenced this issue Aug 5, 2020
miguelportilla pushed a commit to miguelportilla/rippled that referenced this issue Dec 3, 2020
Add support to allow multiple indepedent nodes to produce a binary identical
shard for a given range of ledgers. The advantage is that servers can use
content-addressable storage, and can more efficiently retrieve shards by
downloading from multiple peers at once and then verifying the integrity of
a shard by cross-checking its checksum with the checksum other servers report.
manojsdoshi pushed a commit to manojsdoshi/rippled that referenced this issue Mar 11, 2021
Add support to allow multiple indepedent nodes to produce a binary identical
shard for a given range of ledgers. The advantage is that servers can use
content-addressable storage, and can more efficiently retrieve shards by
downloading from multiple peers at once and then verifying the integrity of
a shard by cross-checking its checksum with the checksum other servers report.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature Request Used to indicate requests to add new features Reviewed
Projects
None yet
Development

No branches or pull requests

4 participants