Skip to content
This repository has been archived by the owner on Jul 18, 2023. It is now read-only.

Discussion: Storage engine options and defaults #37

Closed
patrickmn opened this issue Jun 7, 2017 · 10 comments
Closed

Discussion: Storage engine options and defaults #37

patrickmn opened this issue Jun 7, 2017 · 10 comments

Comments

@patrickmn
Copy link
Contributor

Constellation currently uses BerkeleyDB for all storage. It includes code for LevelDB and SQLite, however one cannot currently choose either of them. The simple reason BerkeleyDB is the default is that it was faster than the other options in our testing.

Constellation was specifically designed to use stateless crypto--XSalsa20 allows for randomly generated nonces--in order to support hosting the same key pair on multiple Constellation nodes and using a shared underlying datastore like S3 without requiring contentious nonce management, and positing only the restriction that the data store must have read-after-creation consistency. With S3 and similar, thinking about redundancy and backups becomes a lot simpler, and since the payloads stored are encrypted, storing them with a cloud provider doesn't involve much risk.

Ideally, the --storage option will work as follows:

  • constellation-node --storage=data -- use default engine (BerkeleyDB) in the folder 'data'
  • constellation-node --storage=bdb:data -- explicitly use BerkeleyDB in the folder 'data'
  • constellation-node --storage=s3:constellationstore -- use the 'constellationstore' bucket on S3 (credentials fetched from ~/.aws/credentials or env vars on startup)
  • ...

Now:

  • What should the out of the box default be? Should it continue to be BerkeleyDB? Other options include BoltDB, LevelDB, RocksDB, ...
  • What other options should be supported? For example:
    • S3
    • Google Cloud DataStore (10MiB object limit) or Google Blobstore
    • Azure Blob Storage
    • Redis
    • Tahoe-LAFS
    • seaweedfs
    • ...

(These would be the options supported out of the box in the standalone version, but you would still be able to import Constellation as a library and supply anything that satisfies the Storage datatype for exotic requirements.)

@patrickmn
Copy link
Contributor Author

After writing this, I realized a useful escape hatch is a directory storage engine that simply creates a file for each payload. That way, you can use any FUSE adapter you want (as long as it has read-after-create consistency.)

PR here: #38

@zookozcash
Copy link

Coincidentally, in the Zcash project we're in the process of migrating off of BerkeleyDB (which we suspect of being unreliable) to some alternative, potentially sqlite. Here is our discussion of that: zcash/zcash#2221

If you could give us numbers from "BerkeleyDB was faster than the other options in our testing", that would be useful.

@patrickmn
Copy link
Contributor Author

Funny coincidence! SQLite does seem like a very solid choice/default. It's crazy how reliable it is.

Unfortunately I don't have the numbers anymore, but I'll rerun some benchmarks on the different ones we have implemented (maildir-style, leveldb, sqlite, bdb.)

The main caveat with SQLite IIRC is concurrent write contention. Constellation is typically (at least as used in in Quorum) around 50/50 read/write.

@patrickmn
Copy link
Contributor Author

@zookozcash also, we are having issues with the most recent bdb API/symbols changing: #28

@camswx
Copy link

camswx commented Jun 13, 2017

@patrickmn If you create a file for each payload, would something like IPFS/Ethereum Swarm/Sia/StorJ become an option for distributed storage?

@tjayrush
Copy link

Hi. My name is Jay Rush. I was asked by Andy Tudhope in the Consensys topic-dev-practices slack to check this conversation out because Andy had seen my work with QuickBlocks (http://quickblocks.io -- the website is seriously out of date).

The goal of QuickBlocks is two fold: first: fast delivery of EVM data, but second: fast deliver of better than EVM data. QuickBlocks parses the EVM data and returns it to the 'language of the originating smart contract,' and then it caches that data for significantly faster delivery than we see from the RPC interface for example.

I'm not exactly sure how I can help, but I'm very interested in what you're discussing. I'll probably mostly lurk and listen, but I will speak up if I see somewhere where I can add to the conversation.

@conor10
Copy link
Contributor

conor10 commented Jun 17, 2017

@patrickmn - some initial thoughts:

A default of SQLite sounds sensible if the performance is up to scratch. Reading the linked issue and potential loss of funds caused by BDB is scary stuff. Can’t afford to see that happen in Constellation either.

There’s definitely value in supporting AWS S3/Azure Blobstore/Google Cloud Datastore/Redis, however, is anyone requesting them at this point? You’re probably fine with the directory approach for now (will review code shortly). Further integrations could end up being time-consuming to support.

Likewise, I don’t think it’s necessary to support additional in-memory stores at this stage (beyond what’s already there). Chances are if someone is deploying Quorum in a HA environment they’re going to want to throw one of the cloud data stores into the mix, rather then need further in-memory store options.

The storage option semantics make sense. It would be good to allow users to make use of the implementations that have already been written in Constellation 😃.

@tylobban
Copy link

(@tjayrush btw I reached out via mail but may have gone to spam..)

@tjayrush
Copy link

tjayrush commented Jun 21, 2017 via email

@cartazio
Copy link
Contributor

closign this issue for now because the solution is going to be: sqlite support online, as tracked in another ticket,

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants