cmd, core, eth, light, trie: add trie read caching layer #18087

karalabe · 2018-11-12T16:58:58Z

This is an alternative approach to #17873.

In short, this PR introduces a read cache into the trie.Database (previously it was a write cache/gc only). The reason behind this is twofold:

Storage tries are currently not cached. If subsequent blocks access the same read-only data, they keep hammering the disk to retrieve it. It's also a griefing vector since we can create transactions that try to max out storage trie reads (not dangerous at current gas levels, but sure annoying).
Executing pending transactions load parts of the account and storage tries into memory during their execution, but annoyingly dump all that information when a full block arrives. The next full block will however most probably share a lot of disk accesses with our pending state, so might as well reuse anything that we've already loaded.

My original attempt in #17873 cached entire storage tries, and introduced a fancy way to maximize useful data across transactions and blocks. Unfortunately, we have no means to measure the memory usage of that approach, neither easily control it. Alas, although it's cleaner and faster, it was not provably immune to DoS attacks.

This current approach in this PR adds the caching layer in between the database and our internal trie (same way we did for writes/pruning). This is suboptimal because we can only cache rlp encoded nodes, requiring reparsing them every time on access. On the flipside though, it avoid the disk reads the same way my old PR did, but also guarantees memory caps. As a bonus point, the data structure used avoids GC overhead on the millions of read-cached trie nodes, so it my actually be more performant with large enough caches.

Benchmark results (purple = PR, blue = master):

Disk writes grow at the same rate as with master. The absolute value here is not relevant because I restarted the code on top of an existing chain, causing quite some blocks to be reprocessed, but different number on master and pr. The charts are after the system synced. Read wise we can see this pr saves about 75% of disk reads compared to master on mainnet (running with --cache=2048).

Perhaps a more interesting metric is the propagated block processing time. This PR manages to cut new block times down by about 30%. You can see that the PR doesn't help old blocks that much during sync, since there's no preloading (pending transactions are the preloaders), however after sync is done, new import times are much lower than master ones. Our metrics library uses 1024 samples (blocks) for averaging, hence the couple hours it takes for the speedup to be fully visible on the charts.

Full sync benchmarks

i3.2xlarge: 8vcpu, 61 GiB mem, 1.9 TB NVMe SSD

--syncmode==full --cache=2048

build	time	disk reads	disk writes
master	6d 2h 30m	33.28TB	23.27TB
cacher	5d 4h 40m	30.72TB	21.74TB
	-15.27%	-7.69%	-6.57%

holiman

It looks good to me, fwiw, but should be checked by @fjl aswell

core/state/database.go

les/handler.go

trie/database.go

holiman · 2018-11-13T09:07:04Z

Comparing another branch with this pr on TestDifficulty/difficultyRopsten.json/DifficultyTest1023 (adding an artificial fail)

        --- FAIL: TestDifficulty/difficultyRopsten.json/DifficultyTest1023 (0.00s)
        	difficulty_test.go:85: parent[time 15025775352 diff 5197261454665822249 unclehash:1dcc4de8dec75d7aab85b567b6ccd41ad312451b948a7413f0a142fd40d49347] child[time 15025775372 number 4300000] diff 5194723729346161203 != expected 5194723729346161203



        --- FAIL: TestDifficulty/difficultyRopsten.json/DifficultyTest1023 (0.00s)
        	difficulty_test.go:85: parent[time 9323269331 diff 7371607815720194386 unclehash:1dcc4de8dec75d7aab85b567b6ccd41ad312451b948a7413f0a142fd40d49347] child[time 9323269351 number 4300000] diff 7368008397841424760 != expected 7368008397841426808

both parent time and child time seems off

EDIT: I used the wrong output , fixed now

holiman · 2018-11-13T09:09:46Z

Do you get the same errors locally?

holiman · 2018-11-13T09:37:07Z

Test

[user@work go-ethereum]$ cat ./tests/testdata/BasicTests/difficultyRopsten.json  | grep DifficultyTest1023 -A8
 "DifficultyTest1023" : {
		"parentTimestamp" : "0x037f9b22f8",
		"parentDifficulty" : "0x482061c1ba2fb029",
		"parentUncles" : "0x1dcc4de8dec75d7aab85b567b6ccd41ad312451b948a7413f0a142fd40d49347",
		"currentTimestamp" : "0x037f9b230c",
		"currentBlockNumber" : "0x419ce0",
		"currentDifficulty" : "0x48175db581f86a33"
	},

0x037f9b22f8 == 15025775352

karalabe · 2018-11-13T09:41:25Z

Yeah, so I accidentally pushed some old version of the consensus test repo into this PR. Fixed it now.

karalabe · 2018-11-14T10:19:02Z

@holiman I've rebased on master, renamed the nodes field to dirties to make it clearer what it is, and also added 4 metrics to be able to measure the hits/misses, data stores/loads from the read cache. These will be particularly handy when comparing a simple node vs. a light server to see how much strain is put on the caches.

karalabe · 2018-11-15T10:23:36Z

@fjl I've reworked the API so the 0 is not mandatory, rather there's a second set of methods WithCache. PTAL

fjl · 2018-11-15T12:04:29Z

Changes look much cleaner now. Let's investigate if this change also allows disabling the generation-based caching.

karalabe · 2018-11-15T12:15:18Z

Sure, but lets please do that in a followup PR. Removing the cache generations is a non negligible amount of change, after which at least a full sync is needed to validate it, which is +1 week. If this change is ok and works well, I don't think we should postpone it just to add more to it.

crackcomm · 2018-11-15T20:14:06Z

@karalabe I would consider it unsafe in some circumstances but it's just being paranoid, is'nt it?

this is why I submitted #18118

karalabe requested review from fjl and holiman November 12, 2018 16:58

karalabe requested a review from zsfelfoldi as a code owner November 12, 2018 16:58

karalabe mentioned this pull request Nov 13, 2018

core/state: cache storage tries across blocks and pending state #17873

Closed

holiman approved these changes Nov 13, 2018

View reviewed changes

core/state/database.go Outdated Show resolved Hide resolved

les/handler.go Outdated Show resolved Hide resolved

trie/database.go Outdated Show resolved Hide resolved

trie/database.go Outdated Show resolved Hide resolved

karalabe force-pushed the trie-read-cacher branch from f6ede58 to cbc87ca Compare November 13, 2018 08:58

karalabe force-pushed the trie-read-cacher branch from cbc87ca to 47df810 Compare November 13, 2018 09:39

karalabe added this to the 1.8.19 milestone Nov 13, 2018

karalabe force-pushed the trie-read-cacher branch 2 times, most recently from 9c7ef0b to 06de6d5 Compare November 14, 2018 10:17

karalabe force-pushed the trie-read-cacher branch from 06de6d5 to e966ebc Compare November 14, 2018 12:20

cmd, core, eth, light, trie: add trie read caching layer

434dd5b

karalabe force-pushed the trie-read-cacher branch from e966ebc to 434dd5b Compare November 15, 2018 10:22

karalabe merged commit 17d67c5 into ethereum:master Nov 15, 2018

karalabe mentioned this pull request Feb 5, 2019

full sync / archive node - never finishes synching #16237

Closed

enlight mentioned this pull request Dec 12, 2019

Allow EVM state to be flushed to disk every Nth block instead of every block loomnetwork/loomchain#1532

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cmd, core, eth, light, trie: add trie read caching layer #18087

cmd, core, eth, light, trie: add trie read caching layer #18087

karalabe commented Nov 12, 2018 •

edited

Loading

holiman left a comment

holiman commented Nov 13, 2018 •

edited

Loading

holiman commented Nov 13, 2018

holiman commented Nov 13, 2018 •

edited

Loading

karalabe commented Nov 13, 2018

karalabe commented Nov 14, 2018

karalabe commented Nov 15, 2018

fjl commented Nov 15, 2018

karalabe commented Nov 15, 2018

crackcomm commented Nov 15, 2018

cmd, core, eth, light, trie: add trie read caching layer #18087

cmd, core, eth, light, trie: add trie read caching layer #18087

Conversation

karalabe commented Nov 12, 2018 • edited Loading

Benchmark results (purple = PR, blue = master):

Full sync benchmarks

holiman left a comment

Choose a reason for hiding this comment

holiman commented Nov 13, 2018 • edited Loading

holiman commented Nov 13, 2018

holiman commented Nov 13, 2018 • edited Loading

karalabe commented Nov 13, 2018

karalabe commented Nov 14, 2018

karalabe commented Nov 15, 2018

fjl commented Nov 15, 2018

karalabe commented Nov 15, 2018

crackcomm commented Nov 15, 2018

karalabe commented Nov 12, 2018 •

edited

Loading

holiman commented Nov 13, 2018 •

edited

Loading

holiman commented Nov 13, 2018 •

edited

Loading