geth --fast stalls before crossing finish line #15001

Dirksterson · 2017-08-18T11:46:26Z

System information

Geth version: geth version 1.5.9-stable, Go1.7.4
OS & Version: OSX 10.12.6 MacMini 4GB RAM (latest MacMini doesn't support field RAM upgrade anymore) VDSL connection with an average of 20-40Mbit throughput. Ethereum Wallet 0.9.0
Commit hash : (if develop)

Expected behaviour

fast sync to current latest block followed by auto disabling

Actual behaviour

stalling from a few thousand blocks up to a few hundred to current latest block. Tries to catch up to latest block, but number of new blocks is greater than the speed of adding fast blocks. Never auto disables fast sync mode.

Steps to reproduce the behaviour

Removedb and geth --fast --cache=1024. 5 times on that machine over the last weeks.

Fast sync is already my workaround, starting a fresh fast sync from scratch. Before I was unsuccessful on that machine trying to sync with existing blockchain data instead. This was also a lost race of catching up to the latest block on that machine. This workaround was good until now.

Today even the workaround in fast sync mode (cache -1024) will not completely load the blockchain anymore. It catches up some hundred blocks to the latest block and stalls for hours. By the time it catches up a few hundred blocks, the latest block moved ahead again. The closer geth is getting to import to the latest block (at time of writing 4173161), the slower it gets. It does not catch up anymore. Tried 5 times now over the last weeks and giving up at around 4-5 days each.

Does the machine not meet todays minimum hardware requirement anymore or is this a major bug?

Backtrace

latest block 13 hours ago (!)

I0818 00:15:26.444933 core/blockchain.go:805] imported 148 receipts in 2.775s. #4169952 [e3f556fc… / 36f4d3c9…]

...

latest header chain 50 minutes ago

I0818 12:47:45.107445 core/headerchain.go:342] imported 1 headers in 4.954ms. #4173009 [350d1426… / 350d1426…]

...

currently only importing nothing but state entries

I0818 13:36:41.103101 eth/downloader/downloader.go:966] imported 172 state entries in 10.009s: processed 10010213, pending at least 129361
I0818 13:36:41.103131 eth/downloader/downloader.go:966] imported 384 state entries in 783.519ms: processed 10010597, pending at least 129361
I0818 13:36:41.103154 eth/downloader/downloader.go:966] imported 381 state entries in 6.963s: processed 10010978, pending at least 129361
I0818 13:36:41.103167 eth/downloader/downloader.go:966] imported 25 state entries in 87.654ms: processed 10011003, pending at least 129360
I0818 13:36:46.014244 eth/downloader/downloader.go:966] imported 384 state entries in 2.482s: processed 10011387, pending at least 127584
I0818 13:36:49.074483 eth/downloader/downloader.go:966] imported 381 state entries in 7.082s: processed 10011768, pending at least 127105
I0818 13:36:49.074553 eth/downloader/downloader.go:966] imported 384 state entries in 7.971s: processed 10012152, pending at least 127105
I0818 13:36:49.074574 eth/downloader/downloader.go:966] imported 384 state entries in 3.772s: processed 10012536, pending at least 127105
I0818 13:36:49.074603 eth/downloader/downloader.go:966] imported 162 state entries in 5.822s: processed 10012698, pending at least 127105
I0818 13:36:49.074622 eth/downloader/downloader.go:966] imported 25 state entries in 4.050s: processed 10012723, pending at least 127105
I0818 13:36:49.074639 eth/downloader/downloader.go:966] imported 381 state entries in 3.060s: processed 10013104, pending at least 127105
I0818 13:36:49.074742 eth/downloader/downloader.go:966] imported 85 state entries in 7.117s: processed 10013189, pending at least 127105
I0818 13:36:49.074765 eth/downloader/downloader.go:966] imported 375 state entries in 2.219s: processed 10013564, pending at least 127105
I0818 13:36:49.074782 eth/downloader/downloader.go:966] imported 87 state entries in 3.915s: processed 10013651, pending at least 127105
I0818 13:36:49.074795 eth/downloader/downloader.go:966] imported 23 state entries in 271.734ms: processed 10013674, pending at least 127104

The text was updated successfully, but these errors were encountered:

kevingentile · 2017-08-20T20:43:44Z

I have been having a similar issue recently. Ubuntu 16.04. Stalling on the last ~100-200 blocks. Restarting the geth client has allowed for some of those missing blocks to be processed but it does not keep up with the highest block. The only fluctuation I see in eth.syncing is the number of knownStates and pulledStates.

Shem-Tov · 2017-08-20T22:44:57Z

I am having the exact same issue as Laughing Cabbage has described, also on Ubuntu 16.04, and also stuck on the last few hundred blocks.
I am running geth1.6.7, at the moment. I have also tried versions 1.7.0, 1.6.6 and 1.6.5, with the same issue. I have tried applying --fast, and have tried without it.
When I restart geth, it usually gets a few more blocks in, and starts "downloading" the chain structure from 0. Downloading is in parenthesis, because when checking the folder into which it should be downloading, the folder access date and time does not change, nor can I find any other folder to which it saves the chain structure to.
Leaving it overnight, will get a chain structure in the millions, but the blocks will still not sync.
Searching the web, I have seen this problem exists for many people, for a very long time, across every platform, and with every version of geth, and no one has come up with any kind of solution. And since I am at best an amateur programmer, I have given up with geth.
I will try parity.io now, hopefully they have allowed people with little and no programming skills to connect to ethereum, and if not, then my solution is to give up on ethereum all together. That will solve the headache this issue is starting to create :-)
I'll check back with geth when in reaches version 2.

kevingentile · 2017-08-21T04:24:20Z

If any current devs think they might have a lead as to where a good starting point might be for tracking this issue I'm happy to do some bug hunting, please let me know.

tomtom87 · 2017-08-21T05:04:37Z

@Dirksterson @laughingcabbage I have exactly the same issues for past week and so do many of my colleagues.

After latest advice to run --fast --cache=1024 i now get the following:

WARN [08-21|11:48:26] Stalling state sync, dropping peer       peer=655c0278c317a012
WARN [08-21|11:48:26] Stalling state sync, dropping peer       peer=f26dce0aea871dc8
WARN [08-21|11:48:26] Stalling state sync, dropping peer       peer=0fb49536fda319d3
WARN [08-21|11:48:27] Stalling state sync, dropping peer       peer=ae8de9feee4df4e6
WARN [08-21|11:48:27] Stalling state sync, dropping peer       peer=e7a69c447cb83857
WARN [08-21|11:48:30] Stalling state sync, dropping peer       peer=8e8edc9627fedc6b
WARN [08-21|11:48:32] Stalling state sync, dropping peer       peer=606587b48a16fd10
WARN [08-21|11:48:32] Node data write error                    err="state node 638deb…cf0f09 failed with all peers (4 tries, 4 peers)"

csillag · 2017-08-21T10:18:21Z

Same here. I am also on v1.6.7.

Current status, after running it for more than a week:

Downloading block 4,179,697 of 4,179,911,
Downloading chain structure 8,242,414 of 8,246,476

csillag · 2017-08-21T10:24:04Z

Isn't this a duplicate of #14988 and also #14995?

darksh1ne · 2017-08-21T11:39:53Z

The similar issue here. On Aug, 16th I had almost fully synced blockchain, just 10-20 hours behind the current block. I then started geth as:

$ geth --syncmode=fast --cache=$(( 1024 + 512 ))

All the time geth is behind the current block. Currently (Aug, 21st) its state is:

> eth.syncing
{
  currentBlock: 4181084,
  highestBlock: 4182536,
  knownStates: 0,
  pulledStates: 0,
  startingBlock: 4179967
}

whereas etherscan.io shows 4185672 as the last block.

There are no errors in geth's output, it is in its normal state of slowly importing new segments and using HDD at speed 5-10 MB/s (both reading and writting). No high CPU usage.

INFO [08-21|14:27:00] Imported new chain segment               blocks=1 txs=60  mgas=6.645  elapsed=26.766s   mgasps=0.248  number=4181082 hash=036737…8ef0ce
INFO [08-21|14:27:16] Imported new chain segment               blocks=1 txs=77  mgas=1.748  elapsed=16.123s   mgasps=0.108  number=4181083 hash=d498b7…8c64a9
INFO [08-21|14:28:44] Imported new chain segment               blocks=1 txs=137 mgas=6.699  elapsed=1m28.060s mgasps=0.076  number=4181084 hash=b8153c…a3bcbf
INFO [08-21|14:30:44] Imported new chain segment               blocks=1 txs=62  mgas=6.691  elapsed=1m59.831s mgasps=0.056  number=4181085 hash=4e7b58…7f71d5

My geth is:

$ geth attach
Welcome to the Geth JavaScript console!

instance: Geth/v1.6.7-stable/linux-amd64/go1.8
coinbase: <hidden>
at block: 4166508 (Wed, 16 Aug 2017 23:59:48 EEST)
 datadir: <hidden>
 modules: admin:1.0 debug:1.0 eth:1.0 miner:1.0 net:1.0 personal:1.0 rpc:1.0 txpool:1.0 web3:1.0

$ geth version
Geth
Version: 1.6.7-stable
Architecture: amd64
Protocol Versions: [63 62]
Network Id: 1
Go Version: go1.8
Operating System: linux
GOPATH=
GOROOT=/usr/lib/go

tomtom87 · 2017-08-22T06:16:14Z

same issue here, started around the same time. Looks like this is throughout everyone and affecting parity users also now

gdassori · 2017-08-28T14:11:11Z

Hello, Ubuntu 16.04 here and same issue: got stuck on the last ~2000 blocks.

tomtom87 · 2017-08-29T01:27:21Z

If you dont got ssd u aint ever going to get them. If u got ssd, just constantly restart the docker container and your client and pray. eventually after several days. You must be persistant.. it will crash randomly and then when you reopen it will be syncing. Really it took me 2 weeks to do this.

…

On 28 Aug 2017, at 21:11, dax ***@***.***> wrote: Hello, Ubuntu 16.04 here and same issue: got stuck on the last ~2000 blocks. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

MrHash · 2017-09-01T13:01:37Z

Same problem. Can't sync last ~100 blocks on 1.6.7. Restarting gets close but lots of Stalling state sync, dropping peer messages. SSD and fibre connection.

tomtom87 · 2017-09-01T13:25:17Z

Try using a ssd drive and docker image. This is working for me, expect atleast 4 hours to sync up. Sundays when volume is low is a good time to try

…

On 1 Sep 2017, at 20:01, Hasham Ahmad ***@***.***> wrote: Same problem. Can't sync last ~100 blocks on 1.6.7. Restarting gets close but lots of Stalling state sync, dropping peer messages — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

wtfiwtz · 2017-09-02T22:22:13Z

Is this related? 0042f13 and #14460

alfkors · 2017-09-03T02:39:26Z

@wtfiwtz I don't really know enough about the whole process, but I would say yeah probably... for what it's worth...

wtfiwtz · 2017-09-06T08:28:02Z

I was able to get it to successfully sync up, after switching from fast sync to normal sync and giving it a day or two to catch up on the last 35,000 or so blocks - using a 2012-era MacBook Pro with an SSD drive. It was necessary to be on the the latest block to be able to successfully submit a transaction with the Ethereum "Mist" wallet (or you get an error about insufficient gas).

Not sure if the light mode would make any difference, but I think you need to do it will a brand new wallet, not an existing blockchain download.

wtfiwtz · 2017-09-23T13:26:21Z

Ok I had to restart the sync from the beginning and have hit this problem again...

This is what I have found... Blocks are getting discarded from peers because the chain height is incorrectly set to 0.

wtfiwtz@f34b775

INFO [09-23|23:16:30] Loaded most recent local full block      number=0       hash=d4e567…cb8fa3 td=17179869184
INFO [09-23|23:16:30] Loaded most recent local fast block      number=4304570 hash=657bf3…912f25 td=1006522706491316931004

INFO [09-23|23:21:06] Peer discarded announcement              peer=d9c3012a7a0dfb3f number=4304681 hash=7e9da0…8db154 distance=4304681
INFO [09-23|23:21:06] ** Block number                          num=4304681
INFO [09-23|23:21:06] ** Chain height                          num=0
WARN [09-23|23:21:06] Discarded propagated block, too far away peer=d9c3012a7a0dfb3f number=4304681 hash=7e9da0…8db154 distance=4304681
INFO [09-23|23:21:06] Peer discarded announcement              peer=a8aafc6f4437be4f number=4304681 hash=7e9da0…8db154 distance=4304681
INFO [09-23|23:21:06] Peer discarded announcement              peer=ea4587bcfb02c92d number=4304681 hash=7e9da0…8db154 distance=4304681
INFO [09-23|23:21:06] ** Block number                          num=4304681
INFO [09-23|23:21:06] ** Chain height                          num=0
WARN [09-23|23:21:06] Discarded propagated block, too far away peer=ea4587bcfb02c92d number=4304681 hash=7e9da0…8db154 distance=4304681

The height is retrieved from a callback function such as this:

	heighter := func() uint64 {
		return blockchain.CurrentBlock().NumberU64()
	}

So this is probably an issue with switching between fast and normal sync modes, where the chain height is assumed to be 0 when it should be equal to the fast chain height on initialization.

Is this an area you are familiar with @karalabe since you did the original fast sync implementation?

wtfiwtz · 2017-09-23T21:33:00Z

If the peer's total diffficulty is much lower, does that mean they are only on the full sync mode and won't work with a fast sync peer?

INFO [09-24|07:31:28] ** Total difficulty                      ours="{neg:false abs:[13700445755005557100 54]}" theirs=17179869184
INFO [09-24|07:31:28] ** fast sync?                            peer=3000f1cf9e63ce38 enabled=1

Pretty much can't find any peers that are not with a significantly lower total difficulty!

This worries me because of the following comment:

// synchronise will select the peer and use it for synchronising. If an empty string is given
// it will use the best peer possible and synchronize if it's TD is higher than our own. If any of the
// checks fail an error will be returned. This method is synchronous
func (d *Downloader) synchronise(id string, hash common.Hash, td *big.Int, mode SyncMode) error {

wtfiwtz · 2017-09-24T02:19:42Z

Ok I left it running this morning, and at some point, it flipped from fast to full mode when it received just 1 more chain segment or block receipts (it hadn't received any for at least 1.5 hours since I last restarted):

INFO [09-24|09:07:27] Peer discarded announcement              peer=b057fec043b525ed number=4305853 hash=37f333…4a74d9 distance=4305853
INFO [09-24|09:07:27] ** Total difficulty                      ours="{neg:false abs:[14195334218426315772 54]}" theirs=17179869184
INFO [09-24|09:07:27] ** fast sync?                            peer=b057fec043b525ed enabled=1
INFO [09-24|09:07:27] ** Block number                          num=4305854
INFO [09-24|09:07:27] ** Chain height                          num=0
WARN [09-24|09:07:27] Discarded propagated block, too far away peer=b057fec043b525ed number=4305854 hash=2e8a61…ae021f distance=4305854
INFO [09-24|09:07:27] Imported new state entries               count=448  elapsed=1.479ms   processed=2608239 pending=2047  retry=2   duplicate=2846 unexpected=8434
INFO [09-24|09:07:29] Imported new state entries               count=779  elapsed=3.995ms   processed=2609018 pending=2225  retry=22  duplicate=2846 unexpected=8434
INFO [09-24|09:07:29] ** Total difficulty                      ours="{neg:false abs:[14195334218426315772 54]}" theirs=17179869184
INFO [09-24|09:07:29] ** fast sync?                            peer=479032d8362da82d enabled=1
INFO [09-24|09:07:31] Imported new state entries               count=1089 elapsed=10.173ms  processed=2610107 pending=1483  retry=1   duplicate=2846 unexpected=8434
INFO [09-24|09:07:35] Imported new state entries               count=1081 elapsed=14.713ms  processed=2611188 pending=48    retry=0   duplicate=2846 unexpected=8434
INFO [09-24|09:07:35] Imported new state entries               count=35   elapsed=853.5µs   processed=2611223 pending=0     retry=0   duplicate=2846 unexpected=8434
INFO [09-24|09:07:35] Imported new block receipts              count=0    elapsed=3.752ms   bytes=0 number=4305451 hash=ac92d6…397f6c ignored=1
INFO [09-24|09:07:35] Committed new head block                 number=4305451 hash=ac92d6…397f6c
INFO [09-24|09:07:35] Imported new chain segment               blocks=1 txs=17 mgas=0.442 elapsed=28.174ms  mgasps=15.701 number=4305452 hash=4a61da…5f72e4
ERROR[09-24|09:07:35]
########## BAD BLOCK #########
Chain config: {ChainID: 1 Homestead: 1150000 DAO: 1920000 DAOSupport: true EIP150: 2463000 EIP155: 2675000 EIP158: 2675000 Byzantium: 9223372036854775807 Engine: ethash}

Number: 4305453
Hash: 0x6c4471bed33ac85f132153650f4f69230e9ef972ff33cba1e79795fb72130c66


Error: unknown ancestor
##############################

WARN [09-24|09:07:35] Synchronisation failed, dropping peer    peer=cb8ebbf8130355a7 err="retrieved hash chain is invalid"
ERROR[09-24|09:07:35] Fast sync complete, auto disabling
INFO [09-24|09:07:35] Removing p2p peer                        id=cb8ebbf8130355a7 conn=inbound duration=1h32m36.442s peers=24 req=false err="useless peer"
INFO [09-24|09:07:36] Ethereum peer connected                  id=8453dbef52518caf conn=dyndial name=Geth/v1.6.7-stable-ab5646c5/linux-amd64/go1.8.1
INFO [09-24|09:07:36] ** Total difficulty                      ours="{neg:false abs:[14195334218426315772 54]}" theirs=1009137134152556054860
INFO [09-24|09:07:36] ** fast sync?                            peer=479032d8362da82d enabled=0
WARN [09-24|09:07:36] Ethereum handshake failed                id=8453dbef52518caf conn=dyndial err="Genesis block mismatch - 6577484f58748da6 (!= d4e56740f876aef8)"
INFO [09-24|09:07:36] Removing p2p peer                        id=8453dbef52518caf conn=dyndial duration=279.836ms    peers=24 req=false err="Genesis block mismatch - 6577484f58748da6 (!= d4e56740f876aef8)"
INFO [09-24|09:07:37] Peer discarded announcement              peer=ca40c7662d6ac5ed number=4305853 hash=37f333…4a74d9 distance=402
INFO [09-24|09:07:37] Peer discarded announcement              peer=ca40c7662d6ac5ed number=4305854 hash=2e8a61…ae021f distance=403
INFO [09-24|09:07:38] Ethereum peer connected                  id=6949cab8fc6d09bd conn=inbound name=Geth/v1.6.2-unstable-2a41e76b/linux-amd64/go1.8.3

The key log messages here are Committed new head block, Imported new block receipts and Imported new chain segment, which allows the full head blockchain count to update.

So I'm guessing that the network is starved of fast blocks, and they haven't yet reached their intended pivot point... before they flip to full mode.

Also note that you can't force it to use full mode on the command line, it doesn't work.

Is there some way to force this flipping from fast to full mode prematurely? Perhaps if we haven't received a new chain segment for over an hour? Or find a peer that has what we are looking for with a more broader peer search?

tomtom87 · 2017-09-24T10:26:46Z

I got this peer syncin problem constantly because i was not on usa time server but using asian one. After changing my ntp time server settings would get 20 peers connecting - previous was 1 to 3. The peers connect but still same errors you show in log. Currently only a few of our machines wallets will finish sync, latest macs newer then 2015 find it easiest. my 2011 mac is slowest. All have ssd. all are using fibre 100mb connections. thanks for support

…

On 24 Sep 2017, at 09:20, Nigel Sheridan-Smith ***@***.***> wrote: Ok I left it running this morning, and at some point, it flipped from fast to full mode when it received just 1 more chain segment: INFO [09-24|09:07:27] Peer discarded announcement peer=b057fec043b525ed number=4305853 hash=37f333…4a74d9 distance=4305853 INFO [09-24|09:07:27] ** Total difficulty ours="{neg:false abs:[14195334218426315772 54]}" theirs=17179869184 INFO [09-24|09:07:27] ** fast sync? peer=b057fec043b525ed enabled=1 INFO [09-24|09:07:27] ** Block number num=4305854 INFO [09-24|09:07:27] ** Chain height num=0 WARN [09-24|09:07:27] Discarded propagated block, too far away peer=b057fec043b525ed number=4305854 hash=2e8a61…ae021f distance=4305854 INFO [09-24|09:07:27] Imported new state entries count=448 elapsed=1.479ms processed=2608239 pending=2047 retry=2 duplicate=2846 unexpected=8434 INFO [09-24|09:07:29] Imported new state entries count=779 elapsed=3.995ms processed=2609018 pending=2225 retry=22 duplicate=2846 unexpected=8434 INFO [09-24|09:07:29] ** Total difficulty ours="{neg:false abs:[14195334218426315772 54]}" theirs=17179869184 INFO [09-24|09:07:29] ** fast sync? peer=479032d8362da82d enabled=1 INFO [09-24|09:07:31] Imported new state entries count=1089 elapsed=10.173ms processed=2610107 pending=1483 retry=1 duplicate=2846 unexpected=8434 INFO [09-24|09:07:35] Imported new state entries count=1081 elapsed=14.713ms processed=2611188 pending=48 retry=0 duplicate=2846 unexpected=8434 INFO [09-24|09:07:35] Imported new state entries count=35 elapsed=853.5µs processed=2611223 pending=0 retry=0 duplicate=2846 unexpected=8434 INFO [09-24|09:07:35] Imported new block receipts count=0 elapsed=3.752ms bytes=0 number=4305451 hash=ac92d6…397f6c ignored=1 INFO [09-24|09:07:35] Committed new head block number=4305451 hash=ac92d6…397f6c INFO [09-24|09:07:35] Imported new chain segment blocks=1 txs=17 mgas=0.442 elapsed=28.174ms mgasps=15.701 number=4305452 hash=4a61da…5f72e4 ERROR[09-24|09:07:35] ########## BAD BLOCK ######### Chain config: {ChainID: 1 Homestead: 1150000 DAO: 1920000 DAOSupport: true EIP150: 2463000 EIP155: 2675000 EIP158: 2675000 Byzantium: 9223372036854775807 Engine: ethash} Number: 4305453 Hash: 0x6c4471bed33ac85f132153650f4f69230e9ef972ff33cba1e79795fb72130c66 Error: unknown ancestor ############################## WARN [09-24|09:07:35] Synchronisation failed, dropping peer peer=cb8ebbf8130355a7 err="retrieved hash chain is invalid" ERROR[09-24|09:07:35] Fast sync complete, auto disabling INFO [09-24|09:07:35] Removing p2p peer id=cb8ebbf8130355a7 conn=inbound duration=1h32m36.442s peers=24 req=false err="useless peer" INFO [09-24|09:07:36] Ethereum peer connected id=8453dbef52518caf conn=dyndial name=Geth/v1.6.7-stable-ab5646c5/linux-amd64/go1.8.1 INFO [09-24|09:07:36] ** Total difficulty ours="{neg:false abs:[14195334218426315772 54]}" theirs=1009137134152556054860 INFO [09-24|09:07:36] ** fast sync? peer=479032d8362da82d enabled=0 WARN [09-24|09:07:36] Ethereum handshake failed id=8453dbef52518caf conn=dyndial err="Genesis block mismatch - 6577484f58748da6 (!= d4e56740f876aef8)" INFO [09-24|09:07:36] Removing p2p peer id=8453dbef52518caf conn=dyndial duration=279.836ms peers=24 req=false err="Genesis block mismatch - 6577484f58748da6 (!= d4e56740f876aef8)" INFO [09-24|09:07:37] Peer discarded announcement peer=ca40c7662d6ac5ed number=4305853 hash=37f333…4a74d9 distance=402 INFO [09-24|09:07:37] Peer discarded announcement peer=ca40c7662d6ac5ed number=4305854 hash=2e8a61…ae021f distance=403 INFO [09-24|09:07:38] Ethereum peer connected id=6949cab8fc6d09bd conn=inbound name=Geth/v1.6.2-unstable-2a41e76b/linux-amd64/go1.8.3 The key log messages here are Committed new head block and Imported new chain segment, which allows the full head blockchain count to update. So I'm guessing that the network is starved of fast blocks, and they haven't yet reached their intended pivot point... before they flip to full mode. Also note that you can't force it to use full mode on the command line, it doesn't work. Is there some way to force this flipping from fast to full mode prematurely? Perhaps if we haven't received a new chain segment for over an hour? Or find a peer that has what we are looking for with a more broader peer search? — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

wtfiwtz · 2017-09-25T00:34:05Z

You'll probably find it much easier to be on Parity (https://parity.io) - the wallet can do a light-mode sync in around 20-30 minutes... this is a good short-to-medium term solution. However, on Mac you need to be on OS X Sierra (or use the brew install instead)

I think someone needs to re-architect the fast sync in geth as the client needs to reach out to more diverse peers when it gets "stuck" for long periods of time like this. I have a few ideas, but very limited time, and it really needs to be done (or reviewed) by someone who knows what they are doing :P

tomtom87 · 2017-09-25T03:03:17Z

Thanks yeh we run parity but it wont sync just the same problem so what we do is run the docker image and everytime it fucks up 'turn it off an on again' Say a little prayer an one out of ten it will work. This is only way we have found to sync to run our business we have one employee her job just come in an run sync every day at 8am before we start then can teamviewer that machine. Ethereum is a labour of love right now.. dunno how that impression affects the new comers out there. Probably aint helping adoption being so un-user friendly.

…

On 25 Sep 2017, at 07:34, Nigel Sheridan-Smith ***@***.***> wrote: You'll probably find it much easier to be on Parity (https://parity.io) - the wallet can do a light-mode sync in around 20-30 minutes... this is a good short-to-medium term solution. However, on Mac you need to be on OS X Sierra (or use the brew install instead) I think someone needs to re-architect the fast sync in geth as the client needs to reach out to more diverse peers when it gets "stuck" for long periods of time like this. I have a few ideas, but very limited time, and it really needs to be done (or reviewed) by someone who knows what they are doing :P — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

wtfiwtz · 2018-04-15T01:39:44Z

@Mergathal that is the nature of a blockchain-based approach. Since BitCoin only has blocks targeting every 10 minutes, the throughput is lower and the number of blocks is lower.

Ethereum generates a new block every 30-60 seconds, allowing more transactions and faster response times. There will naturally be more data generated due to this approach. The data would need to be pruned somehow to keep it at a reasonable level.

Interestingly, in http://www.freekpaans.nl/2018/04/anatomy-geth-fast-sync/, it only took 77Gb of data in the blockchain stored locally for a completed fast sync. I've routinely destroyed fast syncs with much more data than that (... I have limited space on MacBook Pro). It seems to me that the longer that you are pulling down the state tries, the more data that is stored locally. It may also depend on how long you are "full syncing" for as well, once the fast sync is complete. I'm yet to fully understand why but it's an interesting observation.

garyng2000 · 2018-04-15T01:48:03Z

we constantly 'refresh' by fast sync from scratch to keep the size in check. An initial fast sync is only around 60G(as of may be a month ago) then the size grow. after one month we are seeing 140G. Not sure if it is because older state needs to be pulled in or what. Does anyone with 'true' full sync knows the current disk size ?

wtfiwtz · 2018-04-15T01:56:33Z

@garyng2000 a full sync took 220Gb according to the articles linked above. So it would be approximately 80Gb a month as a "fast sync" switches to a "full sync".

garyng2000 · 2018-04-15T02:04:14Z

@wtfiwtz
that is something puzzle me, if it is 80GB a month we are talking TB data soon but how come a 'true' full sync is only 220G ? If that is the case, may be I should do a true full sync(from scratch) that can take a bit of time but the disk growth rate would be slower ? strange.

wtfiwtz · 2018-04-15T02:08:51Z

@garyng2000 it could be because the accumulated state is bigger as you participate in the immediate verification of the transactions, where as post-verification is not as much information to download from peers. However, you would need someone more knowledgeable about Ethereum's inner workings to confirm or deny that.

CryptoKiddies · 2018-04-20T01:57:46Z

I'm on geth v1.8.4 and Ubuntu 16.04. Not only is geth stopping before final sync, but it completely stalls around 30-60 minutes after starting a sync. The CPU usage drops to ~3% of capacity and stays there.

I see continuous error messages for connecting to nodes, and the state and blocks completely stop updating. I have to restart geth (I use systemd restart). This is very concerning because I don't want my node to stall in the middle of serving our dapp.

suspended · 2018-04-20T15:56:15Z

@GeeeCoin you might want to try v1.8.3 - have a simular issue to yours when I moved from .3 to .4

CryptoKiddies · 2018-04-21T03:01:46Z

@suspended v1.8.6 has the same unresolved issue. **downgrading to geth v1.8.3 worked for about 3 weeks, but now facing the same issues

mtj151 · 2018-05-13T18:22:01Z

I am also having the same sync problems... dropping peers etc. I am almost synced (about 50-100 blocks behind if I let it run). If I restart geth it catches up until peers start to drop again.

Using Ubuntu 16.04. I have tried different versions of Geth down to 1.8.2. Built the dev version too with no change.

I have lots of experience running a node having done it since the start... but I did re-download the block chain a month or 2 ago.

I use a SATA 500GB SSD but it is encrypted on the drive level and the home directory which is where the blockchain is stored. The encryption means that the read/write abilities are slower and using a disk monitor it shows a high level of activity constantly while geth is running.

I understand storing/using the blockchain on encrypted drive is probably not the best setup (for speed and amount of read writes/life of SSD) so I'm guessing the next thing I should try is a new separate un-encrypted SSD to store the chain... but I have not got round to doing so yet (having another SSD purely for eth blockchain is fairly expensive option). Currently my chaindata folder is 358.8GB

Looks like Ubuntu 16.04 is a consistent part of this thread/problem?

CryptoKiddies · 2018-05-17T22:21:37Z

@mtj151 good observation. I'm not ruling out any factors at this point. Is anyone using AWS by any chance?

mtj151 · 2018-05-21T10:52:00Z

I have also noticed that I am unable to send transactions while I am getting the "Synchronisation failed, retrying err="block download cancelled (requested)"" warnings.

I sent one transaction fine but then the warnings come up and it wouldn't let me send another transaction (even after the messages stopped and syncing started again). I had to completely restart geth to be able to send the transaction.

ghost · 2018-05-25T06:23:15Z

@GeeeCoin I was unable to get a Geth node to stay up to date with chaintip on AWS in any meaningful time without using Provisioned IOPS SSDs on EBS-optimized instances or the i3 storage-optimized instances with 8GB RAM or greater. Even then, I had to write a watchdog to kick geth over every now and then for when it would drop all its peers or lag too much behind the chaintip. Now I just have dedicated boxes for geth nodes running NVMe SSD in the datacenter, and a NUC for LAN dev which has a 1 TB SATA SSD, 8GB RAM and a quad-core processor.

CryptoKiddies · 2018-05-25T19:02:21Z

@10A7 appreciate the data point. If NUC is outperforming a quad core with 8GB in AWS, that's a problem. Amazon may have network latency that hasn't been optimized with the t. class. The i3 looks like an option. We're taking a look at Quarian; thanks for building that out!

mtj151 · 2018-06-07T11:39:18Z

Sounds like 10a7 had the same problem with lagging behind the chain tip... good description of the problem. Did NVMe SSD fix the problem?? I'm looking at getting one in the coming weeks to run geth.

ghost · 2018-06-07T23:21:55Z

@mtj151 NVMe SSD doesn't seem to matter. I have no trouble keeping SATA SSDs and bcache-fronted magnetic arrays intact and synced I/O wise.

If you are synced and "importing new chain segment", it seems to mostly be network issues that cause my nodes to fall behind. Restarting geth often helps to get different peers. Geth sync-after-fast-pivot is also much more reliable for me if I am not behind a NAT, and can forward/open 30303/tcp.

jdowning100 · 2018-06-17T21:23:03Z

FWIW I was able to get geth to fully sync by waiting until eth.blockNumber is near the numbers in eth.syncing and then restarting geth. I was able to do this at ~160m states. After restarting geth, it took about 20 min to catch up to the blockchain and now eth.syncing is false and the only output now is 'imported new chain segment' every time a new block is found.

karalabe · 2018-10-04T07:48:24Z

@
Syncing Ethereum is a pain point for many people, so I'll try to detail what's happening behind the scenes so there might be a bit less confusion.

The current default mode of sync for Geth is called fast sync. Instead of starting from the genesis block and reprocessing all the transactions that ever occurred (which could take weeks), fast sync downloads the blocks, and only verifies the associated proof-of-works. Downloading all the blocks is a straightforward and fast procedure and will relatively quickly reassemble the entire chain.

Many people falsely assume that because they have the blocks, they are in sync. Unfortunately this is not the case, since no transaction was executed, so we do not have any account state available (ie. balances, nonces, smart contract code and data). These need to be downloaded separately and cross checked with the latest blocks. This phase is called the state trie download and it actually runs concurrently with the block downloads; alas it take a lot longer nowadays than downloading the blocks.

So, what's the state trie? In the Ethereum mainnet, there are a ton of accounts already, which track the balance, nonce, etc of each user/contract. The accounts themselves are however insufficient to run a node, they need to be cryptographically linked to each block so that nodes can actually verify that the account's are not tampered with. This cryptographic linking is done by creating a tree data structure above the accounts, each level aggregating the layer below it into an ever smaller layer, until you reach the single root. This gigantic data structure containing all the accounts and the intermediate cryptographic proofs is called the state trie.

Ok, so why does this pose a problem? This trie data structure is an intricate interlink of hundreds of millions of tiny cryptographic proofs (trie nodes). To truly have a synchronized node, you need to download all the account data, as well as all the tiny cryptographic proofs to verify that noone in the network is trying to cheat you. This itself is already a crazy number of data items. The part where it gets even messier is that this data is constantly morphing: at every block (15s), about 1000 nodes are deleted from this trie and about 2000 new ones are added. This means your node needs to synchronize a dataset that is changing 200 times per second. The worst part is that while you are synchronizing, the network is moving forward, and state that you begun to download might disappear while you're downloading, so your node needs to constantly follow the network while trying to gather all the recent data. But until you actually do gather all the data, your local node is not usable since it cannot cryptographically prove anything about any accounts.

If you see that you are 64 blocks behind mainnet, you aren't yet synchronized, not even close. You are just done with the block download phase and still running the state downloads. You can see this yourself via the seemingly endless Imported state entries [...] stream of logs. You'll need to wait that out too before your node comes truly online.

Q: The node just hangs on importing state enties?!

A: The node doesn't hang, it just doesn't know how large the state trie is in advance so it keeps on going and going and going until it discovers and downloads the entire thing.

The reason is that a block in Ethereum only contains the state root, a single hash of the root node. When the node begins synchronizing, it knows about exactly 1 node and tries to download it. That node, can refer up to 16 new nodes, so in the next step, we'll know about 16 new nodes and try to download those. As we go along the download, most of the nodes will reference new ones that we didn't know about until then. This is why you might be tempted to think it's stuck on the same numbers. It is not, rather it's discovering and downloading the trie as it goes along.

Q: I'm stuck at 64 blocks behind mainnet?!

A: As explained above, you are not stuck, just finished with the block download phase, waiting for the state download phase to complete too. This latter phase nowadays take a lot longer than just getting the blocks.

Q: Why does downloading the state take so long, I have good bandwidth?

A: State sync is mostly limited by disk IO, not bandwidth.

The state trie in Ethereum contains hundreds of millions of nodes, most of which take the form of a single hash referencing up to 16 other hashes. This is a horrible way to store data on a disk, because there's almost no structure in it, just random numbers referencing even more random numbers. This makes any underlying database weep, as it cannot optimize storing and looking up the data in any meaningful way.

Not only is storing the data very suboptimal, but due to the 200 modification / second and pruning of past data, we cannot even download it is a properly pre-processed way to make it import faster without the underlying database shuffling it around too much. The end result is that even a fast sync nowadays incurs a huge disk IO cost, which is too much for a mechanical hard drive.

Q: Wait, so I can't run a full node on an HDD?

A: Unfortunately not. Doing a fast sync on an HDD will take more time than you're willing to wait with the current data schema. Even if you do wait it out, an HDD will not be able to keep up with the read/write requirements of transaction processing on mainnet.

You however should be able to run a light client on an HDD with minimal impact on system resources. If you wish to run a full node however, an SSD is your only option.

CryptoKiddies · 2018-10-04T16:58:50Z

@karalabe Thanks for breaking this down again. We knew most of this about Geth/Eth already, but I'm really surprised as to how suboptimal the state trie system is at being stored to disk; I thought the whole point of building ethereum this way (with modified patricia trees etc.) was to minimize footprint/disk mods, but looks like innovation in storage structures is still needed.

hustnn · 2018-10-11T01:16:33Z

@karalabe . Nice introduction. Understanding fast sync internal better.

quietnan · 2018-10-21T23:19:24Z

@karalabe So is there any way of knowing how close you are to being finished syncing? None of the metrics from the eth_syncing call seems to carry meaningful information about this.

nyetwurk · 2020-01-27T02:20:07Z

@karalabe So is there any way of knowing how close you are to being finished syncing? None of the metrics from the eth_syncing call seems to carry meaningful information about this.

#16558
https://eips.ethereum.org/EIPS/eip-2029

If those are actually implemented, you'll at least be able to scrape the number of states from an external reference.

wtfiwtz mentioned this issue Sep 25, 2017

New Geth 1.7.0 is the worst yet ethereum/mist#3055

Closed

bartoszbetka mentioned this issue Aug 29, 2018

Test deployment with full mainnet blockchain golemfactory/concent-deployment#202

Open

adamschmideg added this to the 1.8.17 milestone Oct 3, 2018

adamschmideg added the high-priority label Oct 4, 2018

adamschmideg removed this from the 1.8.17 milestone Oct 4, 2018

karalabe closed this as completed Oct 4, 2018

adamschmideg added status:triage and removed status:triage labels Dec 14, 2018

0xKrishna mentioned this issue Apr 18, 2022

Bor cannot connect to any node maticnetwork/bor#190

Closed

geth --fast stalls before crossing finish line #15001

geth --fast stalls before crossing finish line #15001

Comments

Dirksterson commented Aug 18, 2017 • edited Loading

System information

Expected behaviour

Actual behaviour

Steps to reproduce the behaviour

Backtrace

kevingentile commented Aug 20, 2017

Shem-Tov commented Aug 20, 2017

kevingentile commented Aug 21, 2017

tomtom87 commented Aug 21, 2017 • edited Loading

csillag commented Aug 21, 2017

csillag commented Aug 21, 2017

darksh1ne commented Aug 21, 2017

tomtom87 commented Aug 22, 2017

gdassori commented Aug 28, 2017

tomtom87 commented Aug 29, 2017 via email

MrHash commented Sep 1, 2017 • edited Loading

tomtom87 commented Sep 1, 2017 via email

wtfiwtz commented Sep 2, 2017 • edited Loading

alfkors commented Sep 3, 2017

wtfiwtz commented Sep 6, 2017

wtfiwtz commented Sep 23, 2017

wtfiwtz commented Sep 23, 2017

wtfiwtz commented Sep 24, 2017 • edited Loading

tomtom87 commented Sep 24, 2017 via email

wtfiwtz commented Sep 25, 2017

tomtom87 commented Sep 25, 2017 via email

wtfiwtz commented Apr 15, 2018 • edited Loading

garyng2000 commented Apr 15, 2018

wtfiwtz commented Apr 15, 2018

garyng2000 commented Apr 15, 2018

wtfiwtz commented Apr 15, 2018

CryptoKiddies commented Apr 20, 2018 • edited Loading

suspended commented Apr 20, 2018

CryptoKiddies commented Apr 21, 2018 • edited Loading

mtj151 commented May 13, 2018 • edited Loading

CryptoKiddies commented May 17, 2018

mtj151 commented May 21, 2018

ghost commented May 25, 2018 • edited by ghost Loading

CryptoKiddies commented May 25, 2018

mtj151 commented Jun 7, 2018 • edited Loading

ghost commented Jun 7, 2018 • edited by ghost Loading

jdowning100 commented Jun 17, 2018

karalabe commented Oct 4, 2018

CryptoKiddies commented Oct 4, 2018

hustnn commented Oct 11, 2018

quietnan commented Oct 21, 2018

nyetwurk commented Jan 27, 2020 • edited Loading

Dirksterson commented Aug 18, 2017 •

edited

Loading

tomtom87 commented Aug 21, 2017 •

edited

Loading

MrHash commented Sep 1, 2017 •

edited

Loading

wtfiwtz commented Sep 2, 2017 •

edited

Loading

wtfiwtz commented Sep 24, 2017 •

edited

Loading

wtfiwtz commented Apr 15, 2018 •

edited

Loading

CryptoKiddies commented Apr 20, 2018 •

edited

Loading

CryptoKiddies commented Apr 21, 2018 •

edited

Loading

mtj151 commented May 13, 2018 •

edited

Loading

ghost commented May 25, 2018 •

edited by ghost

Loading

mtj151 commented Jun 7, 2018 •

edited

Loading

ghost commented Jun 7, 2018 •

edited by ghost

Loading

nyetwurk commented Jan 27, 2020 •

edited

Loading