Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instability syncing rate (archive node) #199

Closed
koen84 opened this issue May 8, 2021 · 8 comments
Closed

Instability syncing rate (archive node) #199

koen84 opened this issue May 8, 2021 · 8 comments

Comments

@koen84
Copy link

koen84 commented May 8, 2021

System information

Geth version: 1.0.7
OS & Version: Ubuntu 18.04 LTS

Enterprise grade server with :
AMD EPYC 7502P 32-Core Processor
128 GB RAM
4* 4TB NVMe in raid 10

Archive node with max peers 200 & cache 64000.

Expected behaviour

Block syncing rate is as constant around 20 blocks per minute (within some 10% variance).

Actual behaviour

BSC_ChainRate_210508_2d

Instead i get wild fluctuations. And if i check logs, i sometimes notice syncing is behind chainhead as well.
System stats are all green (e.g. disk R/W ~10MBs and iops R/W 250)

I see similar things on my 2nd archive node.

Steps to reproduce the behaviour

Run an archive node and observe behaviour over time.

@zcrypt0
Copy link

zcrypt0 commented May 8, 2021

I've had a better perf keeping the chain synced to the head with the WIP branch:
#152

@koen84
Copy link
Author

koen84 commented May 8, 2021

Interesting. A little risky for production nodes imho, especially since it's actively being worked on..

@zcrypt0
Copy link

zcrypt0 commented May 9, 2021

Indeed but working is working. I wasn't able to maintain a sync with the 1.0.7 binary at all. You can fix the pull to a certain commit to alleviate the risk that they may just break the branch with a push.

@koen84
Copy link
Author

koen84 commented May 10, 2021

I've got another BSC server i'm preparing, i'll be trying the 1.0.7-hX releases or the WIP branch on this one, to compare the difference. The annoying thing is, the metrics don't seem to include the "network chain head" (as opposed to the "synced last block") or the lag (in blocks / time) between them, unless i'm missing it ?

@DefiDebauchery
Copy link

@koen84 If you don't mind me asking, what's your actual RAM usage on these machines. I know that your cache setting has a lot to do with it, but is >64GB RAM truly being utilized on the archive node?

@koen84
Copy link
Author

koen84 commented May 13, 2021

@DefiDebauchery cross-posting from the OOM issue :

On my 1.0.7 archive node BSC consumes 48GB RAM.
On my 1.1.0-beta archive node BSC was consuming 100GB RAM after approx a day and 3h after restart 78GB.

Both nodes run with --cache 64000 argument. The 1.1.0-beta node also runs with --snapshot=false and --txlookuplimit=0 arguments.


Pretty much all remaining RAM in the server is being used in buffer / cache. Those machines run only BSC for internal RPC access.

@koen84
Copy link
Author

koen84 commented May 13, 2021

For comparistion, my experience with openethereum archive nodes is that cache 48000 is about the limit you can pull on a server with 64GB RAM, any more and you'll start swapping (to the point the server becomes unusable).

@j75689
Copy link
Contributor

j75689 commented Jul 30, 2021

Hi all.
Thank you for your report.
We have received many report of a sync issue.
You can try the latest version. If you have any probleums, please feedback #338.
We will pay attention to the issue #338 for a long time, and if there are any updates, we will explain it on this issue.

Thanks.

@j75689 j75689 closed this as completed Jul 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants