RocksDB opens too many files on osx Mojave #18373

kzaher · 2018-12-29T11:19:42Z

System information

Geth version:
Version: 1.8.20-stable
Architecture: amd64
Protocol Versions: [63 62]
Network Id: 1
Go Version: go1.11.4
Operating System: darwin
GOPATH=/Users/kzaher/go
GOROOT=/usr/local/Cellar/go/1.11.4/libexec

OS & Version: OSX Mojave 10.14.2 (18C54)
Commit hash : v1.8.20 24d727b

Expected behaviour

For it to sync properly.

Actual behaviour

If fails with "too many open files".

Steps to reproduce the behaviour

run:
geth --cache=4096 --maxpeers=100 --syncmode "fast" --rpcapi personal,web3,eth,net --datadir /Volumes/evo/ethereum/chains/main
wait for 11 hours.

Backtrace

This is command output, should be clear enough.

INFO [12-29|12:15:14.184] Imported new block headers               count=1    elapsed=4.398ms   number=6973582 hash=1f96be…af99f3
INFO [12-29|12:15:14.216] Imported new state entries               count=934  elapsed=7.075ms   processed=162791880 pending=82093  retry=0   duplicate=4482 unexpected=12643
INFO [12-29|12:15:14.335] Imported new state entries               count=1273 elapsed=6.678ms   processed=162793153 pending=83594  retry=0   duplicate=4482 unexpected=12643
INFO [12-29|12:15:14.452] Imported new state entries               count=1284 elapsed=7.463ms   processed=162794437 pending=84692  retry=0   duplicate=4482 unexpected=12643
INFO [12-29|12:15:14.562] Imported new state entries               count=1120 elapsed=5.119ms   processed=162795557 pending=86125  retry=0   duplicate=4482 unexpected=12643
INFO [12-29|12:15:14.655] Imported new state entries               count=1321 elapsed=7.131ms   processed=162796878 pending=86894  retry=0   duplicate=4482 unexpected=12643
INFO [12-29|12:15:14.728] Imported new state entries               count=1152 elapsed=8.927ms   processed=162798030 pending=87199  retry=0   duplicate=4482 unexpected=12643
INFO [12-29|12:15:14.811] Imported new state entries               count=941  elapsed=5.444ms   processed=162798971 pending=87836  retry=29  duplicate=4482 unexpected=12643
INFO [12-29|12:15:14.907] Imported new state entries               count=1152 elapsed=6.657ms   processed=162800123 pending=89210  retry=0   duplicate=4482 unexpected=12643
WARN [12-29|12:15:14.954] Rolled back headers                      count=5    header=6973582->6973577 fast=6973514->6973514 block=0->0
WARN [12-29|12:15:14.954] Synchronisation failed, retrying         err="DB write error: open /Volumes/evo/ethereum/chains/main/geth/chaindata: too many open files"
WARN [12-29|12:15:18.644] Ancestor below allowance                 peer=e6f0d2b93d0d107f number=6883514 hash=000000…000000 allowance=6883514
WARN [12-29|12:15:18.644] Synchronisation failed, dropping peer    peer=e6f0d2b93d0d107f err="retrieved ancestor is invalid"
WARN [12-29|12:15:25.708] Ancestor below allowance                 peer=b90884cd400ea40b number=6883514 hash=000000…000000 allowance=6883514
WARN [12-29|12:15:25.708] Synchronisation failed, dropping peer    peer=b90884cd400ea40b err="retrieved ancestor is invalid"

Level DB is holding a bunch of files open.

Running lsof -n -c geth displays that rocks db is holding a bunch of files open.

geth    32897 kzaher *787r     REG               50,3   2136771     286750 /Volumes/evo/ethereum/chains/main/geth/chaindata/617605.ldb
geth    32897 kzaher *788r     REG               50,3   2135242     286734 /Volumes/evo/ethereum/chains/main/geth/chaindata/617589.ldb
geth    32897 kzaher *789r     REG               50,3   2138953     286687 /Volumes/evo/ethereum/chains/main/geth/chaindata/617542.ldb
geth    32897 kzaher *790r     REG               50,3   2148590     176194 /Volumes/evo/ethereum/chains/main/geth/chaindata/421286.ldb

lsof -n -c geth | wc -l > 24392

My disk is writing constantly ~100MB/s. There is probably additional write amplification on filesystem level. Not to mention rocks db compacting. This is wearing out my SSD significantly.

Is rocks db really the optimal database for this use case? Isn't there anything more efficient? I can increase the file limit temporarily with sudo sysctl -w kern.maxfilesperproc=70000, but this is tearing my computer apart.

The text was updated successfully, but these errors were encountered:

AyushyaChitransh · 2019-04-30T10:16:58Z

Am I reading it correct that you are referring to rocksdb here? Or was it a typo and you intended to say leveldb?

kzaher · 2019-04-30T20:32:11Z

@AyushyaChitransh RocksDB is based on LevelDB as far as I can tell. I'm not sure which flavor is geth using right now. Why does it matter?

holiman · 2019-05-03T07:15:11Z

Go-ethereum uses leveldb, not Rocksdb. Geth queries the operating system for the allowance, and makes sure to stay below that. Decreasing the number of files allowed could be done, but it would only mean more processing time, since files needs to be closed and opened.

So, this is not a 'bug' -- it would be it if allocated more files than it was actually given, and threw errors when running out of filehandles.

If we actually wanted to lower the number of files used by leveldb, we could increase the leveldb file size. However, every attempt we have done at that degrades performance; since doing so increases the compaction overhead.

It might be that certain filesystems have a built-in overhead for large number of files, specifically windows-users on NTFS have sometimes reported that.
In 1.9, we will switch to using an ancient storage for old data (blocks, headers, receipts etc) that will cut the leveldb data size roughly in half, which should greatly improve the situation for all platforms.

I'll close this for now, since it's not really a 'bug' and not really actionable. Please reopen if you have something more concrete.

holiman · 2019-05-03T07:16:35Z

Edit to add: during the fast-sync phase where both headers, bodies, receipts and state is downloaded, geth is extremely write-intense. It will go down once the data is downloaded.

kzaher · 2019-05-03T19:53:23Z

I really appreciate your work on geth, and please don't consider this as trolling, but for some reason geth behaves much worse then Ethereum Parity client for me. The initial fast sync time is almost an order of magnitude lower and hardware utilization much better.

Again, I'm not saying this to troll you guys, I would really want to use your client since it's de facto standard, compatibility with the ecosystem is better and if something goes south this will probably be the reference behavior.

I appreciate the 1.9.0 release and all of the improvements that were done there, but for some reason it seems like it's excruciatingly slower and almost kills my hardware.

This issue was reported almost half a year ago, so excuse me for not remembering the exact details, but when closing the client for a day it took almost half an hour to catch up for a single day and it usually queried the disk multiple a couple of hundred MB/s.

I've tried this on multiple computers, multiple SSDs, still the same.

Are there at least some settings where I can tradeoff memory for speed and disk utilization?

rjl493456442 self-assigned this Dec 29, 2018

markrtgh mentioned this issue Feb 10, 2019

Panic in findAncestor #18994

Closed

holiman closed this as completed May 3, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RocksDB opens too many files on osx Mojave #18373

RocksDB opens too many files on osx Mojave #18373

kzaher commented Dec 29, 2018

AyushyaChitransh commented Apr 30, 2019

kzaher commented Apr 30, 2019

holiman commented May 3, 2019

holiman commented May 3, 2019

kzaher commented May 3, 2019 •

edited

Loading

RocksDB opens too many files on osx Mojave #18373

RocksDB opens too many files on osx Mojave #18373

Comments

kzaher commented Dec 29, 2018

System information

Expected behaviour

Actual behaviour

Steps to reproduce the behaviour

Backtrace

AyushyaChitransh commented Apr 30, 2019

kzaher commented Apr 30, 2019

holiman commented May 3, 2019

holiman commented May 3, 2019

kzaher commented May 3, 2019 • edited Loading

kzaher commented May 3, 2019 •

edited

Loading