-
Notifications
You must be signed in to change notification settings - Fork 771
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trie: cache CheckpointTrie in memory #1035
Conversation
trie: seperate base DB and checkpoint DB
Codecov Report
Flags with carried forward coverage won't be shown. Click here to find out more. |
packages/trie/src/checkpointDb.ts
Outdated
async put(key: Buffer, val: Buffer): Promise<void> { | ||
if (this.isCheckpoint) { | ||
// put value in cache | ||
this.checkpoints[this.checkpoints.length - 1].keyValueMap.set(key.toString('hex'), val) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a weird bug where, if you would set
the value to just the raw key.toString()
, then if you'd use Buffer.from(key)
you'd get a different buffer. So I have settled with hex encoding here, but it should be faster/better if we can just use the raw buffer so we don't have to convert it (maybe has to do with some encoding issue?). Very weird.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think hex encoding is the right option here.
I found in the Buffer docs that it says:
If encoding is 'utf8' and a byte sequence in the input is not valid UTF-8, then each invalid byte is replaced with the replacement character U+FFFD.
so I wonder if that could be related to the bug you encountered.
Another idea I had is that base64 encoding would actually output a shorter string for the key, but I'm unsure if that would make a significant difference in the performance of the map. How large could these checkpoint maps conceivably get?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These checkpoint maps can get huge, since we checkpoint the trie before the block and then flush it when the block is done. The only thing I'm pretty scary of is when we hit an extremely bad block which hits a lot of trie items and caches these all, this could probably run into some memory issues... 🤔
The problem I have with this hex encoding is that we actually convert these raw bytes to strings. So if you'd use one byte (let's say 0xff
) then internally it will be represented as "ff"
which is 2 bytes, so this doubles the strain on memory. I'll take a look at this encoding, maybe if I explicitly pass utf8
as encoding it works.
Hmm, weird that it crashes on a random blockchain test while passing thousands of others 🤔 |
trie: save key-values read from disk in cache
Very weird as the Trie changes apparently found some edge case in the blockchain tests which are unrelated to the changes in Trie. Have fixed. |
packages/trie/src/checkpointDb.ts
Outdated
import { LevelUp } from 'levelup' | ||
import { DB, BatchDBOp } from './db' | ||
|
||
export const ENCODING_OPTS = { keyEncoding: 'binary', valueEncoding: 'binary' } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ENCODING_OPTS
can also be imported from ./db
packages/trie/src/secure.ts
Outdated
const db = this.db.copy() | ||
const secureTrie = new SecureTrie(db._leveldb, this.root) | ||
if (includeCheckpoints && this.isCheckpoint) { | ||
secureTrie.db.checkpoints = this.db.checkpoints.slice() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this slice
for a (shallow) copy?
I've never seen slice
without any params, in ES6 you could use the spread syntax: [...this.db.checkpoints]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep this copies the array. I took this from the existing implementation of CheckpointTrie.copy
- but I agree it looks a bit confusing. Will improve.
packages/trie/src/checkpointDb.ts
Outdated
keyValueMap.forEach(function (value, key) { | ||
currentKeyValueMap.set(key, value) | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
more of a nit than anything else, but this could be written shorthand in ES6:
keyValueMap.forEach((value, key) => currentKeyValueMap.set(key, value))
overall this looks really awesome, nice inline comments that provide explanation. just left a few small comments in review, otherwise will approve 👍 |
Have updated the branch here. @jochem-brouwer have you run this PR in the client? How does this behave? Otherwise once you find some time to address the comments from @ryanio I think we can get this merged relatively soon. |
Hi @holgerd77, thanks for the rebase, I just tested on client and it seems to have improved execution speed. @ryanio ready for re-review 😄 |
@@ -73,6 +73,10 @@ export default async function runBlockchainTest(options: any, testData: any, t: | |||
common, | |||
}) | |||
|
|||
// Need to await the init promise: in some tests, we do not run the iterator (which awaits the initPromise) | |||
// If the initPromise does not finish, the `rawHead` of `blockchain.meta()` is still `undefined`. | |||
await blockchain.initPromise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think there is any chance users of the blockchain library may run into this?
If so then as you previously suggested we can make an async public factory method to generate the blockchain and make the constructor private. For example libp2p uses .create(opts)
as an async factory method (although in their case they don't restrict the use of the constructor, if a PeerId is supplied then it doesn't need to be async and the constructor can be used)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
oh if you wanted to include an update to the changelog i will re-approve 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will merge here since I would want to rebase further work on top, hope that's ok. Gave this just a brief look, looks great also with this new CheckpointDB
subclass, makes a lot of sense.
I also tested in the client with blocks containing transactions. I also have the impression that this has gotten significantly faster again.
If one is looking at the time diffs from the logs on execution there is also some evidence that this is the case, here is a run from master
:
And here one using the code base from this PR:
Seems to be more execution logs per second on the second run. Just a very rough indicator though of course.
Any, @jochem-brouwer, nice work! 😄
This PR:
BaseTrie
and the extended DB used byCheckpointTrie
(andSecureTrie
) into two DBs; the baseDB
and theCheckpointDB
.