Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd, core, eth, light, trie: add trie read caching layer #18087

Merged
merged 1 commit into from
Nov 15, 2018

Conversation

karalabe
Copy link
Member

@karalabe karalabe commented Nov 12, 2018

This is an alternative approach to #17873.

In short, this PR introduces a read cache into the trie.Database (previously it was a write cache/gc only). The reason behind this is twofold:

  • Storage tries are currently not cached. If subsequent blocks access the same read-only data, they keep hammering the disk to retrieve it. It's also a griefing vector since we can create transactions that try to max out storage trie reads (not dangerous at current gas levels, but sure annoying).
  • Executing pending transactions load parts of the account and storage tries into memory during their execution, but annoyingly dump all that information when a full block arrives. The next full block will however most probably share a lot of disk accesses with our pending state, so might as well reuse anything that we've already loaded.

My original attempt in #17873 cached entire storage tries, and introduced a fancy way to maximize useful data across transactions and blocks. Unfortunately, we have no means to measure the memory usage of that approach, neither easily control it. Alas, although it's cleaner and faster, it was not provably immune to DoS attacks.

This current approach in this PR adds the caching layer in between the database and our internal trie (same way we did for writes/pruning). This is suboptimal because we can only cache rlp encoded nodes, requiring reparsing them every time on access. On the flipside though, it avoid the disk reads the same way my old PR did, but also guarantees memory caps. As a bonus point, the data structure used avoids GC overhead on the millions of read-cached trie nodes, so it my actually be more performant with large enough caches.


Benchmark results (purple = PR, blue = master):

Disk writes grow at the same rate as with master. The absolute value here is not relevant because I restarted the code on top of an existing chain, causing quite some blocks to be reprocessed, but different number on master and pr. The charts are after the system synced. Read wise we can see this pr saves about 75% of disk reads compared to master on mainnet (running with --cache=2048).

screenshot from 2018-11-13 10-03-00

screenshot from 2018-11-13 10-03-34

Perhaps a more interesting metric is the propagated block processing time. This PR manages to cut new block times down by about 30%. You can see that the PR doesn't help old blocks that much during sync, since there's no preloading (pending transactions are the preloaders), however after sync is done, new import times are much lower than master ones. Our metrics library uses 1024 samples (blocks) for averaging, hence the couple hours it takes for the speedup to be fully visible on the charts.

screenshot from 2018-11-13 10-10-58

screenshot from 2018-11-13 10-05-55


Full sync benchmarks

i3.2xlarge: 8vcpu, 61 GiB mem, 1.9 TB NVMe SSD

--syncmode==full --cache=2048

build time disk reads disk writes
master 6d 2h 30m 33.28TB 23.27TB
cacher 5d 4h 40m 30.72TB 21.74TB
-15.27% -7.69% -6.57%

Copy link
Contributor

@holiman holiman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to me, fwiw, but should be checked by @fjl aswell

core/state/database.go Outdated Show resolved Hide resolved
les/handler.go Outdated Show resolved Hide resolved
trie/database.go Outdated Show resolved Hide resolved
trie/database.go Outdated Show resolved Hide resolved
@holiman
Copy link
Contributor

holiman commented Nov 13, 2018

Comparing another branch with this pr on TestDifficulty/difficultyRopsten.json/DifficultyTest1023 (adding an artificial fail)

        --- FAIL: TestDifficulty/difficultyRopsten.json/DifficultyTest1023 (0.00s)
        	difficulty_test.go:85: parent[time 15025775352 diff 5197261454665822249 unclehash:1dcc4de8dec75d7aab85b567b6ccd41ad312451b948a7413f0a142fd40d49347] child[time 15025775372 number 4300000] diff 5194723729346161203 != expected 5194723729346161203



        --- FAIL: TestDifficulty/difficultyRopsten.json/DifficultyTest1023 (0.00s)
        	difficulty_test.go:85: parent[time 9323269331 diff 7371607815720194386 unclehash:1dcc4de8dec75d7aab85b567b6ccd41ad312451b948a7413f0a142fd40d49347] child[time 9323269351 number 4300000] diff 7368008397841424760 != expected 7368008397841426808

both parent time and child time seems off

EDIT: I used the wrong output , fixed now

@holiman
Copy link
Contributor

holiman commented Nov 13, 2018

Do you get the same errors locally?

@holiman
Copy link
Contributor

holiman commented Nov 13, 2018

Test

[user@work go-ethereum]$ cat ./tests/testdata/BasicTests/difficultyRopsten.json  | grep DifficultyTest1023 -A8
 "DifficultyTest1023" : {
		"parentTimestamp" : "0x037f9b22f8",
		"parentDifficulty" : "0x482061c1ba2fb029",
		"parentUncles" : "0x1dcc4de8dec75d7aab85b567b6ccd41ad312451b948a7413f0a142fd40d49347",
		"currentTimestamp" : "0x037f9b230c",
		"currentBlockNumber" : "0x419ce0",
		"currentDifficulty" : "0x48175db581f86a33"
	},

0x037f9b22f8 == 15025775352

@karalabe karalabe added this to the 1.8.19 milestone Nov 13, 2018
@karalabe
Copy link
Member Author

Yeah, so I accidentally pushed some old version of the consensus test repo into this PR. Fixed it now.

@karalabe karalabe force-pushed the trie-read-cacher branch 2 times, most recently from 9c7ef0b to 06de6d5 Compare November 14, 2018 10:17
@karalabe
Copy link
Member Author

@holiman I've rebased on master, renamed the nodes field to dirties to make it clearer what it is, and also added 4 metrics to be able to measure the hits/misses, data stores/loads from the read cache. These will be particularly handy when comparing a simple node vs. a light server to see how much strain is put on the caches.

@karalabe
Copy link
Member Author

@fjl I've reworked the API so the 0 is not mandatory, rather there's a second set of methods WithCache. PTAL

@fjl
Copy link
Contributor

fjl commented Nov 15, 2018

Changes look much cleaner now. Let's investigate if this change also allows disabling the generation-based caching.

@karalabe
Copy link
Member Author

Sure, but lets please do that in a followup PR. Removing the cache generations is a non negligible amount of change, after which at least a full sync is needed to validate it, which is +1 week. If this change is ok and works well, I don't think we should postpone it just to add more to it.

@karalabe karalabe merged commit 17d67c5 into ethereum:master Nov 15, 2018
@crackcomm
Copy link
Contributor

@karalabe I would consider it unsafe in some circumstances but it's just being paranoid, is'nt it?

this is why I submitted #18118

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants