Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Global critical problem. Slow sync! #545

Closed
TehnobitSystems opened this issue Nov 14, 2021 · 61 comments
Closed

Global critical problem. Slow sync! #545

TehnobitSystems opened this issue Nov 14, 2021 · 61 comments
Labels
help wanted Extra attention is needed

Comments

@TehnobitSystems
Copy link

TehnobitSystems commented Nov 14, 2021

Hi everyone!
Full BSC node cannot fully sync. Never. The same problem is affecting many guys here.
Many posts have already been created on github and other sources, but there is no solution can be find anywhere. Many recommendations do not help solve the problem.
This is already a global problem, but the developers are not responding.
How to get a reaction from the BSC engineering team?
If you know someone from the BSC team or other guys who can help, let's mention the engineers here so that they would see this.

@AwesomeMylaugh
Copy link

+1, I have the same problem and try every way i can find,still dont work!!!!!
developers help us asap plz!!!!

@elnem0
Copy link

elnem0 commented Nov 14, 2021

Confirming. Same issue. Tried snapshot, from genesis, all syncmodes, all tips from threads on github.

@civa
Copy link

civa commented Nov 15, 2021

Same here.

@FeurJak
Copy link

FeurJak commented Nov 15, 2021

Same.

@FeurJak
Copy link

FeurJak commented Nov 15, 2021

Running i3en.6xlarge on AWS, node gets out of sync too frequently.

@holiman
Copy link
Contributor

holiman commented Nov 15, 2021

@TehnobitSystems out of the devs that you pinged, not a single one has anything to do with BSC. We are/were all developers of the ethereum Geth client. BSC has reused the codebase, and still imports commits from the upstream project go-ethereum, but that does not mean that we are affiliated with BSC.
Please close this to avoid spam for the drive-by mentioned ethereum-devs, and possibly open a new one where you instead ping the BSC-devs.

@xpkore
Copy link

xpkore commented Nov 15, 2021

Running i3en.6xlarge on AWS, node gets out of sync too frequently.

INFO [11-15|22:17:49.357] Imported new chain segment               blocks=1  txs=410   mgas=58.632   elapsed=215.999ms   mgasps=271.446 number=12,667,979 hash=a6765e..d83fee dirty=1.54GiB
INFO [11-15|22:17:51.562] Imported new chain segment               blocks=1  txs=430   mgas=70.805   elapsed=570.162ms   mgasps=124.184 number=12,667,980 hash=98425b..098070 dirty=1.55GiB
INFO [11-15|22:17:51.563] Unindexed transactions                   blocks=1  txs=144   tail=10,317,981 elapsed="936.038µs"
INFO [11-15|22:17:52.737] Imported new chain segment               blocks=1  txs=432   mgas=72.417   elapsed=214.234ms   mgasps=338.028 number=12,667,980 hash=95a0e9..eb2341 dirty=1.55GiB
INFO [11-15|22:17:55.723] Imported new chain segment               blocks=1  txs=501   mgas=84.467   elapsed=717.816ms   mgasps=117.672 number=12,667,981 hash=14c271..330563 dirty=1.56GiB
INFO [11-15|22:17:55.725] Unindexed transactions                   blocks=1  txs=155   tail=10,317,982 elapsed=1.860ms
INFO [11-15|22:17:55.990] Imported new chain segment               blocks=1  txs=460   mgas=79.286   elapsed=267.143ms   mgasps=296.790 number=12,667,981 hash=a5089d..fbbafa dirty=1.55GiB
INFO [11-15|22:17:58.867] Imported new chain segment               blocks=1  txs=468   mgas=68.842   elapsed=684.311ms   mgasps=100.601 number=12,667,982 hash=3fa955..c8a142 dirty=1.56GiB
INFO [11-15|22:17:58.868] Unindexed transactions                   blocks=1  txs=166   tail=10,317,983 elapsed=1.137ms
INFO [11-15|22:18:00.393] Imported new chain segment               blocks=1  txs=579   mgas=62.841   elapsed=550.767ms   mgasps=114.098 number=12,667,983 hash=ac9ccc..a8f940 dirty=1.55GiB
INFO [11-15|22:18:00.393] Unindexed transactions                   blocks=1  txs=82    tail=10,317,984 elapsed="586.839µs"
INFO [11-15|22:18:01.860] Imported new chain segment               blocks=1  txs=573   mgas=60.998   elapsed=212.207ms   mgasps=287.446 number=12,667,983 hash=ea4f37..4fb347 dirty=1.55GiB
INFO [11-15|22:18:04.175] Imported new chain segment               blocks=1  txs=458   mgas=64.906   elapsed=693.264ms   mgasps=93.623  number=12,667,984 hash=1fe456..6579ae dirty=1.56GiB
INFO [11-15|22:18:04.176] Unindexed transactions                   blocks=1  txs=166   tail=10,317,985 elapsed=1.310ms
INFO [11-15|22:18:06.401] Imported new chain segment               blocks=1  txs=381   mgas=48.030   elapsed=498.179ms   mgasps=96.411  number=12,667,985 hash=df6df2..a6380e dirty=1.56GiB
INFO [11-15|22:18:06.402] Unindexed transactions                   blocks=1  txs=85    tail=10,317,986 elapsed="773.789µs"
INFO [11-15|22:18:09.775] Imported new chain segment               blocks=1  txs=488   mgas=69.494   elapsed=645.502ms   mgasps=107.658 number=12,667,986 hash=e71a88..3486d1 dirty=1.56GiB
INFO [11-15|22:18:09.776] Unindexed transactions                   blocks=1  txs=131   tail=10,317,987 elapsed=1.114ms
INFO [11-15|22:18:10.191] Deep froze chain segment                 blocks=20 elapsed=55.777ms    number=12,577,986 hash=294251..212829
INFO [11-15|22:18:14.534] Imported new chain segment               blocks=1  txs=544   mgas=70.510   elapsed=1.812s      mgasps=38.906  number=12,667,987 hash=056cb6..50cd92 dirty=1.56GiB
INFO [11-15|22:18:14.542] Unindexed transactions                   blocks=1  txs=123   tail=10,317,988 elapsed=3.547ms
INFO [11-15|22:18:15.230] Imported new chain segment               blocks=1  txs=408   mgas=57.024   elapsed=694.916ms   mgasps=82.059  number=12,667,988 hash=09059d..942ace dirty=1.56GiB
INFO [11-15|22:18:15.230] Unindexed transactions                   blocks=1  txs=117   tail=10,317,989 elapsed="772.14µs"
INFO [11-15|22:18:16.907] Imported new chain segment               blocks=1  txs=439   mgas=58.774   elapsed=527.387ms   mgasps=111.444 number=12,667,989 hash=16005e..20009e dirty=1.56GiB
INFO [11-15|22:18:16.909] Unindexed transactions                   blocks=1  txs=194   tail=10,317,990 elapsed=1.286ms

i3en.2xlarge USEAST
Synced from the 11/11 snapshot
Looks okay but slower than I remember from a few weeks ago
30 peers

@izidorit
Copy link

Running i3en.6xlarge on AWS, node gets out of sync too frequently.

INFO [11-15|22:17:49.357] Imported new chain segment               blocks=1  txs=410   mgas=58.632   elapsed=215.999ms   mgasps=271.446 number=12,667,979 hash=a6765e..d83fee dirty=1.54GiB
INFO [11-15|22:17:51.562] Imported new chain segment               blocks=1  txs=430   mgas=70.805   elapsed=570.162ms   mgasps=124.184 number=12,667,980 hash=98425b..098070 dirty=1.55GiB
INFO [11-15|22:17:51.563] Unindexed transactions                   blocks=1  txs=144   tail=10,317,981 elapsed="936.038µs"
INFO [11-15|22:17:52.737] Imported new chain segment               blocks=1  txs=432   mgas=72.417   elapsed=214.234ms   mgasps=338.028 number=12,667,980 hash=95a0e9..eb2341 dirty=1.55GiB
INFO [11-15|22:17:55.723] Imported new chain segment               blocks=1  txs=501   mgas=84.467   elapsed=717.816ms   mgasps=117.672 number=12,667,981 hash=14c271..330563 dirty=1.56GiB
INFO [11-15|22:17:55.725] Unindexed transactions                   blocks=1  txs=155   tail=10,317,982 elapsed=1.860ms
INFO [11-15|22:17:55.990] Imported new chain segment               blocks=1  txs=460   mgas=79.286   elapsed=267.143ms   mgasps=296.790 number=12,667,981 hash=a5089d..fbbafa dirty=1.55GiB
INFO [11-15|22:17:58.867] Imported new chain segment               blocks=1  txs=468   mgas=68.842   elapsed=684.311ms   mgasps=100.601 number=12,667,982 hash=3fa955..c8a142 dirty=1.56GiB
INFO [11-15|22:17:58.868] Unindexed transactions                   blocks=1  txs=166   tail=10,317,983 elapsed=1.137ms
INFO [11-15|22:18:00.393] Imported new chain segment               blocks=1  txs=579   mgas=62.841   elapsed=550.767ms   mgasps=114.098 number=12,667,983 hash=ac9ccc..a8f940 dirty=1.55GiB
INFO [11-15|22:18:00.393] Unindexed transactions                   blocks=1  txs=82    tail=10,317,984 elapsed="586.839µs"
INFO [11-15|22:18:01.860] Imported new chain segment               blocks=1  txs=573   mgas=60.998   elapsed=212.207ms   mgasps=287.446 number=12,667,983 hash=ea4f37..4fb347 dirty=1.55GiB
INFO [11-15|22:18:04.175] Imported new chain segment               blocks=1  txs=458   mgas=64.906   elapsed=693.264ms   mgasps=93.623  number=12,667,984 hash=1fe456..6579ae dirty=1.56GiB
INFO [11-15|22:18:04.176] Unindexed transactions                   blocks=1  txs=166   tail=10,317,985 elapsed=1.310ms
INFO [11-15|22:18:06.401] Imported new chain segment               blocks=1  txs=381   mgas=48.030   elapsed=498.179ms   mgasps=96.411  number=12,667,985 hash=df6df2..a6380e dirty=1.56GiB
INFO [11-15|22:18:06.402] Unindexed transactions                   blocks=1  txs=85    tail=10,317,986 elapsed="773.789µs"
INFO [11-15|22:18:09.775] Imported new chain segment               blocks=1  txs=488   mgas=69.494   elapsed=645.502ms   mgasps=107.658 number=12,667,986 hash=e71a88..3486d1 dirty=1.56GiB
INFO [11-15|22:18:09.776] Unindexed transactions                   blocks=1  txs=131   tail=10,317,987 elapsed=1.114ms
INFO [11-15|22:18:10.191] Deep froze chain segment                 blocks=20 elapsed=55.777ms    number=12,577,986 hash=294251..212829
INFO [11-15|22:18:14.534] Imported new chain segment               blocks=1  txs=544   mgas=70.510   elapsed=1.812s      mgasps=38.906  number=12,667,987 hash=056cb6..50cd92 dirty=1.56GiB
INFO [11-15|22:18:14.542] Unindexed transactions                   blocks=1  txs=123   tail=10,317,988 elapsed=3.547ms
INFO [11-15|22:18:15.230] Imported new chain segment               blocks=1  txs=408   mgas=57.024   elapsed=694.916ms   mgasps=82.059  number=12,667,988 hash=09059d..942ace dirty=1.56GiB
INFO [11-15|22:18:15.230] Unindexed transactions                   blocks=1  txs=117   tail=10,317,989 elapsed="772.14µs"
INFO [11-15|22:18:16.907] Imported new chain segment               blocks=1  txs=439   mgas=58.774   elapsed=527.387ms   mgasps=111.444 number=12,667,989 hash=16005e..20009e dirty=1.56GiB
INFO [11-15|22:18:16.909] Unindexed transactions                   blocks=1  txs=194   tail=10,317,990 elapsed=1.286ms

i3en.2xlarge USEAST Synced from the 11/11 snapshot Looks okay but slower than I remember from a few weeks ago 30 peers

Where and what cache size did you set for reaching "dirty=1.56GiB"? I use:

[Eth]
DatabaseCache = 102400

But the used cache size is:
Imported new chain segment blocks=1 txs=470 mgas=76.762 elapsed=1.188s mgasps=64.594 number=12,668,158 hash=f75603..8df542 dirty=1.01GiB

@kwkr
Copy link

kwkr commented Nov 15, 2021

@philzeh could you share you config? .toml + the command that you use to run the node? Also did you do anything special when it comes to setting up the EC2? I tried the setup that you are mentioning but couldn't sync anyway.

@chevoisiatesalvati
Copy link

Hi guys, I'm trying to get a fullnode sync from days without success.
First I tried from snapshot (using Virtualbox with 12 core, 35 gb ram, SSD NVME Samsung 980 PRO), but it went slower than the blockchain itself, so I deleted everything.
Then I tried running it from scratch and it downloaded all the blocks pretty quickly (like around a day) but when it did, it started to "import new entries" like forever (I got more than 500 M of knownstates).
So I decided to do everything again using VMware (I also had some trouble with Virtualbox that's why), from a new snapshot (11/11/2021), but again it's too slow. This time it seems to be faster than the blockchain itself, but I get 1 minute in about 20 seconds. So again, to download 4 days of blockchain, I need almost 2 days at this speed.
It's seems too strange to me, isn't it?

@hbtj123
Copy link

hbtj123 commented Nov 15, 2021

same

@botfi-finance
Copy link

same prob.

@voron
Copy link
Contributor

voron commented Nov 15, 2021

Let me describe our expirience how to get synced node w/o issues.
Key point is:

  • Use low-latency disks(or terabytes of RAM for cache), 1.5TB+. This is really important. Generic cloud SSD will not do the job. See below for details.

Generic requirements:

  • Use at least 8 CPU cores, 16 may suit better, especially when you need to serve RPC besides node sync up
  • Use at least 32GB of RAM, 48-64GB may suit a bit better
  • Use latest bsc, (v1.1.4 at the time of writting), pay attention to following settings:
    • enable diffsync via --diffsync CLI argument and
    [Eth]
    ...
    DisablePeerTxBroadcast = true
    
    It will speed up syncing during last ~1h40m lag only (2048 blocks) in bsc v1.1.4. Larger syncing lag is not affected by diffsync.
    • disable snapshot --snapshot=false, as snapshot may kill IO performance, while it's a key for a synced node
    • You may (or may not) need 300-500 peers to keep your node synced up during BSC hard times, when there is a lof of stale peers
    [Node.P2P]
    MaxPeers = 300
    
    A lot of peers requires significantly more CPU and more network bandwidth, watch your cloud bill
  • Use latest pruned snapshot as bootstrap https://github.com/binance-chain/bsc-snapshots, sync from scratch is almost imposible these days on generic HW
  • Expose udp p2p port, ensure it's opened in the firewall. It may help to discover more peers.

Sync up speed is around 2x compared to generation, thus if your node lags 10 hours, you'll need around 10 hours to sync up, assuming your p2p peers are fine all the time.

Disks details:

  • BSC, as a blockchain node, uses storage in a sequencial fashion in [almost] single thread during sync.
    It means f.e. to get 10k IOPS in single thread, we need at most 0.1ms disk latency ( 1second / 10000 IOPS = 0.1ms ) . Most cloud disks (aws EBS f.e.) are network disks actually, while similar low latencies cannot be achieved via network usually as of today. Hundreds of thousands of cloud SSD IOPS will not help here, as it may be utilized with a bunch of IO threads only.
  • I would say 0.1ms disk latency is a good start, less is better. You may use fio with iodepth=1 to measure it.
  • We use local ephemeral SSDs with GCP (plus RAID/LVM to get required capacity, not to increase speed). Local ephemeral NVMe disks at AWS should work fine too. io2 Block Express from AWS may work, they declare Latency: sub-millisecond, but we didn't tested it. We tried GCP extreme disks also, but have to switch to local ephemeral SSDs.
  • Bare metal servers with server-grade NVME disks should work too, intel optane will suit best here due to lowest latency, but optane is not a hard requirement for sure. SATA SSD may work sometimes too, check latency.

@FeurJak
Copy link

FeurJak commented Nov 15, 2021

Let me describe our expirience how to get synced node w/o issues. Key point is:

  • Use low-latency disks(or terabytes of RAM for cache), 1.5TB+. This is really important. Generic cloud SSD will not do the job. See below for details.

Generic requirements:

  • Use at least 8 CPU cores, 16 may suit better, especially when you need to serve RPC besides node sync up

  • Use at least 32GB of RAM, 48-64GB may suit a bit better

  • Use latest bsc, (1.1.4 at the time of writting), pay attention to following settings:

    • enable diffsync via --diffsync CLI argument and
    [Eth]
    ...
    DisablePeerTxBroadcast = true
    
    • disable snapshot --snapshot=false, as snapshot may kill IO performance, while it's a key for a synced node
    • You may (or may not) need 300-500 peers to keep your node synced up during BSC hard times, when there is a lof of stale peers
    [Node.P2P]
    MaxPeers = 300
    

    A lot of peers requires significantly more CPU and more network bandwidth, watch your cloud bill

  • Use latest pruned snapshot as bootstrap https://github.com/binance-chain/bsc-snapshots, sync from scratch is almost imposible these days on generic HW

Sync up speed is around 2x compared to generation, thus if your node lags 10 hours, you'll need around 10 hours to sync up, assuming your p2p peers are fine all the time.

Disks details:

  • BSC, as a blockchain node, uses storage in a sequencial fashion in [almost] single thread during sync.
    It means f.e. to get 10k IOPS in single thread, we need at most 0.1ms disk latency ( 1second / 10000 IOPS = 0.1ms ) . Most cloud disks (aws EBS f.e.) are network disks actually, while similar low latencies cannot be achieved via network usually as of today. Hundreds of thousands of cloud SSD IOPS will not help here, as it may be utilized with a bunch of IO threads only.
  • I would say 0.1ms disk latency is a good start, less is better. You may use fio with iodepth=1 to measure it.
  • We use local SSDs with GCP (plus RAID/LVM to get required capacity, not to increase speed). io2 Block Express from AWS may work, they declare Latency: sub-millisecond, but we didn't tested it. We tried GCP extreme disks also, but have to switch to local SSD.
  • Bare metal servers with server-grade NVME disks should work too, intel optane will suit best here due to lowest latency, but optane is not a hard requirement for sure. SATA SSD may work sometimes too, check latency.

Thanks for that !
What's the instance type that you are using for GCP ? I might migrate over to GCP from AWS. AWS i3en.6xlarge instance deoesn't seem to cut it with the aws EBS storage in terms of IOPS.

@voron
Copy link
Contributor

voron commented Nov 15, 2021

@FeurJak
Pick up any [8+cpu 32GB+ RAM] instance on GCP or AWS with ephemeral local NVME disks and place BSC datadir on local NVME. You may attach additional disks to the instance to perform backup, as ephemeral NVMEs are ephemeral :). We use n2-standard-16 on GCP.

AWS i3en.6xlarge instance deoesn't seem to cut it with the aws EBS storage in terms of IOPS.

You should use instance storage there, not an aws EBS one. Pay attention to backups when required, as ephemeral storage is not what you wanna use for f.e. crypto wallets.

@izidorit
Copy link

@FeurJak how can I verify how much cache do I use?

@FeurJak
Copy link

FeurJak commented Nov 15, 2021

@FeurJak how can I verify how much cache do I use?

Not too sure how you can verify it, but I set the cache size on my command:

./build/bin/geth --config ./config.toml --datadir ./node --rpc.allow-unprotected-txs --txlookuplimit 0 --http.api web3,eth,miner,net,txpool,debug --rpc --rpcaddr 0.0.0.0 --rpcport 8545 --rpcapi web3,eth,personal,miner,net,txpool,debug console --ipcpath geth.ipc --syncmode fast --gcmode full --snapshot false --cache.preimages --cache 128000 --diffsync

@FeurJak
Copy link

FeurJak commented Nov 15, 2021

@FeurJak Pick up any [8+cpu 32GB+ RAM] instance on GCP or AWS with ephemeral local NVME disks and place BSC datadir on local NVME. You may attach additional disks to the instance to perform backup, as ephemeral NVMEs are ephemeral :). We use n2-standard-16 on GCP.

AWS i3en.6xlarge instance deoesn't seem to cut it with the aws EBS storage in terms of IOPS.

You should use instance storage there, not an aws EBS one. Pay attention to backups when required, as ephemeral storage is not what you wanna use for f.e. crypto wallets.

Yea sorry, just realised my AWS i3en instance does use ephemereal... anyhow giving GCP a go, with N2D Compute optimized 64 Core 64 GB Memory with 9TB local SSD.

@chevoisiatesalvati
Copy link

Let me describe our expirience how to get synced node w/o issues. Key point is:

  • Use low-latency disks(or terabytes of RAM for cache), 1.5TB+. This is really important. Generic cloud SSD will not do the job. See below for details.

Generic requirements:

  • Use at least 8 CPU cores, 16 may suit better, especially when you need to serve RPC besides node sync up

  • Use at least 32GB of RAM, 48-64GB may suit a bit better

  • Use latest bsc, (1.1.4 at the time of writting), pay attention to following settings:

    • enable diffsync via --diffsync CLI argument and
    [Eth]
    ...
    DisablePeerTxBroadcast = true
    
    • disable snapshot --snapshot=false, as snapshot may kill IO performance, while it's a key for a synced node
    • You may (or may not) need 300-500 peers to keep your node synced up during BSC hard times, when there is a lof of stale peers
    [Node.P2P]
    MaxPeers = 300
    

    A lot of peers requires significantly more CPU and more network bandwidth, watch your cloud bill

  • Use latest pruned snapshot as bootstrap https://github.com/binance-chain/bsc-snapshots, sync from scratch is almost imposible these days on generic HW

Sync up speed is around 2x compared to generation, thus if your node lags 10 hours, you'll need around 10 hours to sync up, assuming your p2p peers are fine all the time.

Disks details:

  • BSC, as a blockchain node, uses storage in a sequencial fashion in [almost] single thread during sync.
    It means f.e. to get 10k IOPS in single thread, we need at most 0.1ms disk latency ( 1second / 10000 IOPS = 0.1ms ) . Most cloud disks (aws EBS f.e.) are network disks actually, while similar low latencies cannot be achieved via network usually as of today. Hundreds of thousands of cloud SSD IOPS will not help here, as it may be utilized with a bunch of IO threads only.
  • I would say 0.1ms disk latency is a good start, less is better. You may use fio with iodepth=1 to measure it.
  • We use local ephemeral SSDs with GCP (plus RAID/LVM to get required capacity, not to increase speed). Local ephemeral NVMe disks at AWS should work fine too. io2 Block Express from AWS may work, they declare Latency: sub-millisecond, but we didn't tested it. We tried GCP extreme disks also, but have to switch to local ephemeral SSDs.
  • Bare metal servers with server-grade NVME disks should work too, intel optane will suit best here due to lowest latency, but optane is not a hard requirement for sure. SATA SSD may work sometimes too, check latency.

I was syncing slowly, recovering like 1 minute blockchain time in 20 seconds real time (3x blockchain time speed). Then I read your post, thinking that I was going so slow for those settings you mentioned. Stopped the node, changed setting as you said, restarted the node.
Now I'm getting blocks at blockchain time, being stuck at 2d21h50m behind. LOL
What now? I don't know if roll back to the settings I had before, since the problem could be peers...maybe?
I'm using a VM with ubuntu, 12 core, 35 gb ram, ssd NVME samsung 980 PRO.

@berktaylan
Copy link

Anyone knows how many state entries total currently ? lol

@xpkore
Copy link

xpkore commented Nov 15, 2021

Anyone knows how many state entries total currently ? lol

eth.syncing
{
  currentBlock: 12681283,
  highestBlock: 12681284,
  knownStates: 297473485,
  pulledStates: 297473485,
  startingBlock: 12681281
}

@berktaylan
Copy link

Anyone knows how many state entries total currently ? lol

eth.syncing
{
  currentBlock: 12681283,
  highestBlock: 12681284,
  knownStates: 297473485,
  pulledStates: 297473485,
  startingBlock: 12681281
}

How it can be possible { currentBlock: 12681302, highestBlock: 12681387, knownStates: 706039145, pulledStates: 705885395, startingBlock: 12679086 }

@voron
Copy link
Contributor

voron commented Nov 15, 2021

@chevoisiatesalvati

Now I'm getting blocks at blockchain time, being stuck at 2d21h50m behind. LOL

diffsync may improve sync performance starting from ~1h40m lag and less. 2 days is too large for diffsync to provide sync speed boost. Thus I don't think that diffsync-related settings change affects you.

What now? I don't know if roll back to the settings I had before, since the problem could be peers...maybe?

It's p2p, you cannot just force it to sync asap. Exposed p2p udp port with external IP may speed up peers discovery.

@xpkore
Copy link

xpkore commented Nov 15, 2021

Anyone knows how many state entries total currently ? lol

eth.syncing
{
  currentBlock: 12681283,
  highestBlock: 12681284,
  knownStates: 297473485,
  pulledStates: 297473485,
  startingBlock: 12681281
}

How it can be possible { currentBlock: 12681302, highestBlock: 12681387, knownStates: 706039145, pulledStates: 705885395, startingBlock: 12679086 }

My node is run from snapshot which is pruned, so maybe it's different

@billyadelphia
Copy link

billyadelphia commented Nov 16, 2021

Maybe, just maybe, at the current state BSC network require more capable hardware to sync properly. The minimum requirement won't work anymore.
Because I have AMD Ryzen 9 3900 (12 Core CPU), 128 GB DDR4 ECC Memory, 2 x 1.92 TB SSD with RAID 0, 1Gbps internet, and never had any issue with syncing, always fully synced.

@NullQubit
Copy link

Maybe, just maybe, at the current state BSC network require more capable hardware to sync properly. The minimum requirement won't work anymore. Because I have AMD Ryzen 9 3900 (12 Core CPU), 128 GB DDR4 ECC Memory, 2 x 1.92 TB SSD with RAID 0, 1Gbps internet, and never had any issue with syncing, always fully synced.

People have syncing problems on much (MUCH) more powerful hardware than what you have. I'm using the most powerful hardware available on Azure that costs thousands, and I'm lagging behind. Popular RPC providers are lagging behind. The Azure team performed a performance diagnosis and assured me that the hardware is not the bottleneck. Hardware is not the issue, or if it is, then it's much more complicated than "require more capable hardware".

@billyadelphia
Copy link

billyadelphia commented Nov 16, 2021

People have syncing problems on much (MUCH) more powerful hardware than what you have. I'm using the most powerful hardware available on Azure that costs thousands, and I'm lagging behind. Popular RPC providers are lagging behind. The Azure team performed a performance diagnosis and assured me that the hardware is not the bottleneck. Hardware is not the issue, or if it is, then it's much more complicated than "require more capable hardware".

Wow, that's crazy !
3 months ago I'm just rent a server .
I Download the snapshot and start syncing, it's working fine ever since.
I also disable the entire firewall to make sure all connections aren't blocked (Since I'm bad at configuring firewall, haha).

@NullQubit
Copy link

People have syncing problems on much (MUCH) more powerful hardware than what you have. I'm using the most powerful hardware available on Azure that costs thousands, and I'm lagging behind. Popular RPC providers are lagging behind. The Azure team performed a performance diagnosis and assured me that the hardware is not the bottleneck. Hardware is not the issue, or if it is, then it's much more complicated than "require more capable hardware".

Wow, that's crazy ! 3 months ago I'm just rent a hetzner server here https://www.hetzner.com/dedicated-rootserver/ax61-nvme , cost 84 Euro per month . I Download the snapshot and start syncing, it's working fine until today. I also disable the entire firewall to make sure all connections aren't blocked (Since I'm bad at configuring firewall, haha).

The only ports I have opened are 30311 for P2P and 8545/8546 for RPC. Do you know if any other port is required?

@billyadelphia
Copy link

The only ports I have opened are 30311 for P2P and 8545/8546 for RPC. Do you know if any other port is required?

That's why I disable the firewall since I don't know the ports, and I'm super lazy to checking it.

@uniftyitadmin
Copy link

Hello. I am having the same problem. MY HW is Hetzner Dedicated Server (8 Cores AMD Ryzen 3700X, 48GB RAM,SSD disks,ext4 fs). There is no performance peaks using this HW since there was already BSC node on the same server with literally same settings. i just reinstalled it (full disk problem). Its already syncing for a 8-10 days. I am getting this output on eth.isSyncing:
{
currentBlock: 12686414,
highestBlock: 12687209,
knownStates: 1451780580,
pulledStates: 1451770187,
startingBlock: 12684550
}

Just added --diffsync (and raised max.peers from 250 to 300) so will wait to see if that will solve the problem.

Also, I/O performance is good i think.
t=2021-11-16T04:54:37+0100 lvl=info msg="Imported new state entries" count=384 elapsed="40.246µs" processed=1,451,152,720 pending=75350 trieretry=0 coderetry=0 duplicate=0 unexpected=0
t=2021-11-16T04:54:38+0100 lvl=info msg="Imported new block headers" count=1 elapsed="331.456µs" number=12,686,820 hash=0x12b55c1b1b8c3811cd6107153318666c066b1d8b6d49d09111f63d663cb357a6
t=2021-11-16T04:54:38+0100 lvl=info msg="Imported new state entries" count=258 elapsed="3.026µs" processed=1,451,152,978 pending=75628 trieretry=135 coderetry=0 duplicate=0 unexpected=0

@JohnsonCaii
Copy link

Same here, even with diffsyncoption

@uniftyitadmin
Copy link

Still same for me too.

@xpkore
Copy link

xpkore commented Nov 16, 2021

ax61-nvme syncs good
thanks @billyadelphia

@ib0b
Copy link

ib0b commented Nov 16, 2021

ax61-nvme syncs good thanks @billyadelphia

Have you synced yet?
Did you get lucky and get a gen4 NMVE drive,
you can test with

lsblk -o NAME,FSTYPE,LABEL,MOUNTPOINT,SIZE,MODEL

the gen 3 has the following model SAMSUNG MZQLB1T9HAJR-00007

@xpkore
Copy link

xpkore commented Nov 16, 2021

nvme0n1 1.8T SAMSUNG MZQL21T9HCJR-00A07

feelsgoodman

@ib0b
Copy link

ib0b commented Nov 16, 2021

you got lucky 😂😂

@hbtj123
Copy link

hbtj123 commented Nov 16, 2021

Sync is good until the first hyped launch comes and the blocks not able to handle couple of thousand tx`s.
We are still facing the issues no matter how we are installing or configuring the nodes.

@xpkore
Copy link

xpkore commented Nov 16, 2021

Sync is good until the first hyped launch comes and the blocks not able to handle couple of thousand tx`s. We are still facing the issues no matter how we are installing or configuring the nodes.

They never could handle thousands of tx. The blocks fill up at around 600 tx usually. It would just be a bunch of full blocks for a while, which we currently get anyway.

@DryDragon10
Copy link

Maybe, just maybe, at the current state BSC network require more capable hardware to sync properly. The minimum requirement won't work anymore. Because I have AMD Ryzen 9 3900 (12 Core CPU), 128 GB DDR4 ECC Memory, 2 x 1.92 TB SSD with RAID 0, 1Gbps internet, and never had any issue with syncing, always fully synced.

It's not about the hardware. I have exactly the same server from the same provider, and my node can't sync. You think you always fully synced? Try to rebuild your system and see if you could sync it anymore

@NullQubit
Copy link

@NullQubit

People have syncing problems on much (MUCH) more powerful hardware than what you have. I'm using the most powerful hardware available on Azure that costs thousands, and I'm lagging behind

Are you using local/temporary SSD as BSC datadir ? For example, Dadsv5 series in Azure with 1800GB+ temp storage. Ue any Azure VM to meet minimum requirements and to get temporary SSD and use it as BSC datadir. Pay attention on backups, as it's temporary SSD.

I've tried both using D96ds_v5 and using the temp storage disk for my node, and an attached ultra disk LRS (4072 GiB size, 160k IOPS and 4000 MB/s max throughput), in two different locations (US central and France central).

@ib0b
Copy link

ib0b commented Nov 17, 2021

i finally synced seemed the two biggest bottlenecks are CPU and IOPS (Actual sync time about 20hrs)
Hardware Used :

  • Hetzner AX61 (128 gb ram AMD Ryzen™ 9 3900 12-Core )
  • 2 x 1.92 TB raid 0
  • Got unlucky with gen3 nvme but still synced

Process:

  • download geth 1.1.5 and make it executable , optionally move it /usr/local/bin/geth
  • download mainnet.zip and unzip
  • generate genesis using command below, will also create a mainnet folder for blockchain data
    ./geth_linux --datadir mainnet init genesis.json
  • download the 14 nov 2021 snapshot
  • extract snapshot
  • move snapshot data to mainnet folder
rm -rf mainnet/geth/chaindata
rm -rf mainnet/geth/triecache
mv server/data-seed/geth/chaindata mainnet/geth/chaindata
mv server/data-seed/geth/triecache mainnet/geth/triecache

Actual sync process:

[Optional] Open config.toml and delete the Node Log section, just useful for getting logs straight on the terminal or just use tail to look at the logs

[Optional] Create a service or use screen to run the command below, so it doesn't stop if you are using SSH,
I used screen

Run
screen
then press enter, (anytime you lose connection via ssh, run screen -r to get back the "screen/terminal" where geth was running

Geth Command

geth --config ./config.toml --datadir ./mainnet --cache 100000 --rpc.allow-unprotected-txs --txlookuplimit 0 --http --maxpeers 100 --ws --syncmode=full --snapshot=false --diffsync

Might be very Important

  • keep maxpeers at around 100
  • syncmode full, I used fast in other servers, never synced
  • snapshot false, means you won't be providing snapshots to other people, you can change it to true once you have synced and have a good server
  • diffsync not sure if this helped, but probably did

@chevoisiatesalvati
Copy link

i finally synced seemed the two biggest bottlenecks are CPU and IOPS (Actual sync time about 20hrs) Hardware Used :

  • Hetzner AX61 (128 gb ram AMD Ryzen™ 9 3900 12-Core )
  • 2 x 1.92 TB raid 0
  • Got unlucky with gen3 nvme but still synced

Process:

  • download geth 1.1.5 and make it executable , optionally move it /usr/local/bin/geth
  • download mainnet.zip and unzip
  • generate genesis using command below, will also create a mainnet folder for blockchain data
    ./geth_linux --datadir mainnet init genesis.json
  • download the 14 nov 2021 snapshot
  • extract snapshot
  • move snapshot data to mainnet folder
rm -rf mainnet/geth/chaindata
rm -rf mainnet/geth/triecache
mv server/data-seed/geth/chaindata mainnet/geth/chaindata
mv server/data-seed/geth/triecache mainnet/geth/triecache

Actual sync process:

[Optional] Open config.toml and delete the Node Log section, just useful for getting logs straight on the terminal or just use tail to look at the logs

[Optional] Create a service or use screen to run the command below, so it doesn't stop if you are using SSH, I used screen

Run screen then press enter, (anytime you lose connection via ssh, run screen -r to get back the "screen/terminal" where geth was running

Geth Command

geth --config ./config.toml --datadir ./mainnet --cache 100000 --rpc.allow-unprotected-txs --txlookuplimit 0 --http --maxpeers 100 --ws --syncmode=full --snapshot=false --diffsync

Might be very Important

  • keep maxpeers at around 100
  • syncmode full, I used fast in other servers, never synced
  • snapshot false, means you won't be providing snapshots to other people, you can change it to true once you have synced and have a good server
  • diffsync not sure if this helped, but probably did

Following instructions literally, except for cache size since I only have 32 GB RAM. But I have better ssd (WD Black SN850 in RAID0), I'm still going slow as always. I got 2 hours in 5. I'm behind of about 2 days and half, so at this speed I would need about 5-6 days to sync lol
How could you have done it in 20 hrs? Is it the RAM? I don't know...

@ib0b
Copy link

ib0b commented Nov 19, 2021

i finally synced seemed the two biggest bottlenecks are CPU and IOPS (Actual sync time about 20hrs) Hardware Used :

  • Hetzner AX61 (128 gb ram AMD Ryzen™ 9 3900 12-Core )
  • 2 x 1.92 TB raid 0
  • Got unlucky with gen3 nvme but still synced

Process:

  • download geth 1.1.5 and make it executable , optionally move it /usr/local/bin/geth
  • download mainnet.zip and unzip
  • generate genesis using command below, will also create a mainnet folder for blockchain data
    ./geth_linux --datadir mainnet init genesis.json
  • download the 14 nov 2021 snapshot
  • extract snapshot
  • move snapshot data to mainnet folder
rm -rf mainnet/geth/chaindata
rm -rf mainnet/geth/triecache
mv server/data-seed/geth/chaindata mainnet/geth/chaindata
mv server/data-seed/geth/triecache mainnet/geth/triecache

Actual sync process:

[Optional] Open config.toml and delete the Node Log section, just useful for getting logs straight on the terminal or just use tail to look at the logs
[Optional] Create a service or use screen to run the command below, so it doesn't stop if you are using SSH, I used screen
Run screen then press enter, (anytime you lose connection via ssh, run screen -r to get back the "screen/terminal" where geth was running

Geth Command

geth --config ./config.toml --datadir ./mainnet --cache 100000 --rpc.allow-unprotected-txs --txlookuplimit 0 --http --maxpeers 100 --ws --syncmode=full --snapshot=false --diffsync

Might be very Important

  • keep maxpeers at around 100
  • syncmode full, I used fast in other servers, never synced
  • snapshot false, means you won't be providing snapshots to other people, you can change it to true once you have synced and have a good server
  • diffsync not sure if this helped, but probably did

Following instructions literally, except for cache size since I only have 32 GB RAM. But I have better ssd (WD Black SN850 in RAID0), I'm still going slow as always. I got 2 hours in 5. I'm behind of about 2 days and half, so at this speed I would need about 5-6 days to sync lol How could you have done it in 20 hrs? Is it the RAM? I don't know...

hmmm, I am not entirely sure what it could be:
it might be CPU or RAM
you can run atop -d to see if somehow your iops are bottlenecking, which I doubt since you have nmve gen 4

@NullQubit
Copy link

i finally synced seemed the two biggest bottlenecks are CPU and IOPS (Actual sync time about 20hrs) Hardware Used :

  • Hetzner AX61 (128 gb ram AMD Ryzen™ 9 3900 12-Core )
  • 2 x 1.92 TB raid 0
  • Got unlucky with gen3 nvme but still synced

Process:

  • download geth 1.1.5 and make it executable , optionally move it /usr/local/bin/geth
  • download mainnet.zip and unzip
  • generate genesis using command below, will also create a mainnet folder for blockchain data
    ./geth_linux --datadir mainnet init genesis.json
  • download the 14 nov 2021 snapshot
  • extract snapshot
  • move snapshot data to mainnet folder
rm -rf mainnet/geth/chaindata
rm -rf mainnet/geth/triecache
mv server/data-seed/geth/chaindata mainnet/geth/chaindata
mv server/data-seed/geth/triecache mainnet/geth/triecache

Actual sync process:

[Optional] Open config.toml and delete the Node Log section, just useful for getting logs straight on the terminal or just use tail to look at the logs
[Optional] Create a service or use screen to run the command below, so it doesn't stop if you are using SSH, I used screen
Run screen then press enter, (anytime you lose connection via ssh, run screen -r to get back the "screen/terminal" where geth was running

Geth Command

geth --config ./config.toml --datadir ./mainnet --cache 100000 --rpc.allow-unprotected-txs --txlookuplimit 0 --http --maxpeers 100 --ws --syncmode=full --snapshot=false --diffsync

Might be very Important

  • keep maxpeers at around 100
  • syncmode full, I used fast in other servers, never synced
  • snapshot false, means you won't be providing snapshots to other people, you can change it to true once you have synced and have a good server
  • diffsync not sure if this helped, but probably did

Following instructions literally, except for cache size since I only have 32 GB RAM. But I have better ssd (WD Black SN850 in RAID0), I'm still going slow as always. I got 2 hours in 5. I'm behind of about 2 days and half, so at this speed I would need about 5-6 days to sync lol How could you have done it in 20 hrs? Is it the RAM? I don't know...

How many peers are connected? Make sure traffic on P2P port (30311 by default) is allowed.

@chevoisiatesalvati
Copy link

i finally synced seemed the two biggest bottlenecks are CPU and IOPS (Actual sync time about 20hrs) Hardware Used :

  • Hetzner AX61 (128 gb ram AMD Ryzen™ 9 3900 12-Core )
  • 2 x 1.92 TB raid 0
  • Got unlucky with gen3 nvme but still synced

Process:

  • download geth 1.1.5 and make it executable , optionally move it /usr/local/bin/geth
  • download mainnet.zip and unzip
  • generate genesis using command below, will also create a mainnet folder for blockchain data
    ./geth_linux --datadir mainnet init genesis.json
  • download the 14 nov 2021 snapshot
  • extract snapshot
  • move snapshot data to mainnet folder
rm -rf mainnet/geth/chaindata
rm -rf mainnet/geth/triecache
mv server/data-seed/geth/chaindata mainnet/geth/chaindata
mv server/data-seed/geth/triecache mainnet/geth/triecache

Actual sync process:

[Optional] Open config.toml and delete the Node Log section, just useful for getting logs straight on the terminal or just use tail to look at the logs
[Optional] Create a service or use screen to run the command below, so it doesn't stop if you are using SSH, I used screen
Run screen then press enter, (anytime you lose connection via ssh, run screen -r to get back the "screen/terminal" where geth was running

Geth Command

geth --config ./config.toml --datadir ./mainnet --cache 100000 --rpc.allow-unprotected-txs --txlookuplimit 0 --http --maxpeers 100 --ws --syncmode=full --snapshot=false --diffsync

Might be very Important

  • keep maxpeers at around 100
  • syncmode full, I used fast in other servers, never synced
  • snapshot false, means you won't be providing snapshots to other people, you can change it to true once you have synced and have a good server
  • diffsync not sure if this helped, but probably did

Following instructions literally, except for cache size since I only have 32 GB RAM. But I have better ssd (WD Black SN850 in RAID0), I'm still going slow as always. I got 2 hours in 5. I'm behind of about 2 days and half, so at this speed I would need about 5-6 days to sync lol How could you have done it in 20 hrs? Is it the RAM? I don't know...

hmmm, I am not entirely sure what it could be: it might be CPU or RAM you can run atop -d to see if somehow your iops are bottlenecking, which I doubt since you have nmve gen 4

I don't think the problem is CPU since I have Ryzen 9 5950X
I don't think is RAM since I gave it more than 32 gb and there was no difference.
By the way I ran atop -d but I'm not able to read the values. I mean, I never used it and I don't understand more than the basic ones. Also I don't know if they're normal values or not. I got couple of reds, but maybe they're normal, I don't know. I'll paste you here if you understand it.

image

I got 34 peers at the moment (with max at 100), should be pretty normal, right?

@ib0b
Copy link

ib0b commented Nov 19, 2021

The important value is the DSK read 29744 -- when I was syncing on mine , I got 30-40K, but this should be still plenty
Your peers should be more, you can try checking that port 30311 is open on your hosting service... though I am not 100% sure if this is necessary,
Also try increasing maxpeers to a higher limit

@chevoisiatesalvati
Copy link

chevoisiatesalvati commented Nov 19, 2021

The important value is the DSK read 29744 -- when I was syncing on mine , I got 30-40K, but this should be still plenty Your peers should be more, you can try checking that port 30311 is open on your hosting service... though I am not 100% sure if this is necessary, Also try increasing maxpeers to a higher limit

Do you suggest to restart the node with more maxpeers? I don't know if it's going to change something, but I can try.
The thing I don't understand is how is possible that with 2 nvme gen 4 in RAID0 I'm seeing that they're busy at 95-99% with this low values in read. I'm actually around 20-25k in read and sometimes it goes higher and sometimes lower. But the fact is that with this hardware it shouldn't be so "busy" with these values, right?

@chevoisiatesalvati
Copy link

chevoisiatesalvati commented Nov 19, 2021

I restarted with maxpeers 250, now I got 82 peers but I'm slower than the actual chain lol. In 5 minutes of running I lost 1 minute instead of gaining it ahaha
I'm desperate!

Anyway I was wondering..could be that I'm having this performance due to the fact that I'm running it on a Virtual machine?
Anyone is running the node on a VM?
And how could I improve the performance on that? Cause I can't run Linux on the main hardware since I use it with Windows to do other stuff

@ib0b
Copy link

ib0b commented Nov 19, 2021

The important value is the DSK read 29744 -- when I was syncing on mine , I got 30-40K, but this should be still plenty Your peers should be more, you can try checking that port 30311 is open on your hosting service... though I am not 100% sure if this is necessary, Also try increasing maxpeers to a higher limit

Do you suggest to restart the node with more maxpeers? I don't know if it's going to change something, but I can try. The thing I don't understand is how is possible that with 2 nvme gen 4 in RAID0 I'm seeing that they're busy at 95-99% with this low values in read. I'm actually around 20-25k in read and sometimes it goes higher and sometimes lower. But the fact is that with this hardware it shouldn't be so "busy" with these values, right?

At this point I am not entirely sure what would be causing the slow sync, sorry

@chevoisiatesalvati
Copy link

From when I restarted the node, I'm not going any forward....stuck at 2d11h27m. Same settings as before and same peers. Can't believe it

@caiiiyua
Copy link

I am using snapshot from 16 Nov and got the full node synced around 24 hours including the snapshot downloads.
image

It takes about 1T after full sync.
nohup geth --config ./config.toml --datadir ./data-seed --cache 36000 --rpc.allow-unprotected-txs --txlookuplimit 0 &

peerCount consists of 100 as configured.

mgasps is around 100+ but depends on txns in the block.

elapsed time is around 500ms for a block with 500 txns.

Will see how it goes and see whether elapsed time could be optimized further.

@chevoisiatesalvati
Copy link

chevoisiatesalvati commented Nov 19, 2021

image

This is mine. LOL
What hardware are you using? Also are you on VM or physical machine? And if you're on VM, what's your settings?

Also I'm noticing that the size of my chaindata folder is growing really too fast (I think the ancient folder is growing a lot). Started from snapshot of 16/11 (that was around 700 GB, right?), and I'm already at 1.3 TB and still +2 days behind. I don't think that I'm going able to sync with this amount of space since I only have 1.8 TB of space in total (2 ssd WD black SN850 of 1 TB in RAID0)
What's your total size of the chaindata folder when you're sync? Do you have ancient folder inside chaindata?

@keefel keefel added the help wanted Extra attention is needed label Dec 3, 2021
@duoxehyon
Copy link

duoxehyon commented Dec 20, 2021

I've been syncing for a week by now and still it is importing state entries.

Capture

my hardware:

16 vcpu
64 gb ram
2.3 tb nvme 15000 iops 400 mbps read / write speed

why is it taking forever?
please can someone help.

@forcodedancing
Copy link
Contributor

@NeoMitashi Hello, can you try the suggestions in #338 #502 to see whether it helps or not?

@noXi89
Copy link

noXi89 commented Jan 3, 2022

We gave up on BSC after 2'600'000'000 synced states. If someone needs a number.

@bnb-tw bnb-tw closed this as completed May 19, 2023
@alex-vg
Copy link

alex-vg commented May 19, 2023

so this is fixed? @bnb-tw

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests