Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All BSC nodes are OFF SYNC #189

Closed
ghost opened this issue May 6, 2021 · 160 comments
Closed

All BSC nodes are OFF SYNC #189

ghost opened this issue May 6, 2021 · 160 comments

Comments

@ghost
Copy link

ghost commented May 6, 2021

Well, I have tried to sync my own node and failed. It is syncing a week already. OK, so I decided to buy access to a node in the internet.

I have tried ankr, getblock and quiknode so far, and they ALL are OFF SYNC!!!

Please don't tell me anything about my hardware is weak or I did something wrong.
Just figure out what is going on, and fix it. A month ago everything was alright.

@DefiDebauchery
Copy link

Not sure what there is to fix. The block size and TPS have both increased (exponentially for TPS), and hardware that was sufficient a month ago is no longer able to keep up. Push these other services to improve their resources.

I had sync/lag issues with SSDs on multiple machines. I built a new node using NVMe (PCIe, not SATA mode) and have not had a single hiccup for the several days it's been running. I won't claim that there aren't optimizations that could be done, but the blockchain is IOPS-heavy, and you need hardware to support it.

@ghost
Copy link
Author

ghost commented May 6, 2021

Well, if you need PCIe NVMe instead of SSD, it should be reflected in users manual at least. I have seen two different user manuals on official site, non of them said anything about NVMe. And I have already bought 3 SSD servers.

@DefiDebauchery
Copy link

In their defense, the manual was written long (long) before IOPS had been a limiting factor. But I definitely agree that the docs are a little stagnant as a whole.

@sjors-lemniscap
Copy link

sjors-lemniscap commented May 7, 2021

After experimenting for the last week or so I can confirm that:

  • NVMe is required due to the insane amount of tx's and blocks. Memory / CPU stick with the recommended specs from the docs
  • Using the chaindata export provided by Binance has an extremely hard time catching up to the latest block. It goes way faster when doing a fast sync from scratch
  • The BSC geth fork contains a lot of inefficiencies if you compare it to the Ethereum geth clients. This might need some fixing by the devs in order to improve the sync times and stability going forward
  • Increasing the maxpeers to 2000 is the config.toml helps as well as updating the bootstrapnodes and staticnodes. You can set in addition the --cache flag when starting geth to use the maximum amount of your systems memory

Hope this helps, syncing mainnet took here <48 hours and testnet <4 hours. Server is located in the EU.

EDIT: Scrolled through some other issues and people are curious how large a fast mainnet sync is on disk. Approx 191GB with the new v1.0.7-hf.1 geth version.

@stakecube
Copy link

Can confirm above, same results for us so far. Thanks for the summary @sjors-lemniscap
Could you share how many states you have near sync atm?

@sjors-lemniscap
Copy link

Can confirm above, same results for us so far. Thanks for the summary @sjors-lemniscap
Could you share how many states you have near sync atm?

eth.syncing is showing false since my node has been fully synced. Is there a way to show the PulledStates / KnownStates once a node is synced? Happy to show the output but I don't know the right command to retrieve this info.

@stakecube
Copy link

Ah okay. I don't think there is any command to check after full sync. Maybe someone else knows it.
But I know it's "normal" to show false once fully synced.

This is our output/amount of states right now:
{ currentBlock: 7214868, highestBlock: 7214942, knownStates: 89457658, pulledStates: 89443519, startingBlock: 7214379 }

@ChuckCaplan
Copy link

I'm at 317 million known states with a node size of 187.5 GB syncing in fast mode. Hopefully I will be done soon.

@a04512
Copy link

a04512 commented May 8, 2021

Can confirm above, same results for us so far. Thanks for the summary @sjors-lemniscap
Could you share how many states you have near sync atm?

mine 562,191,528
592,176,122
now 610,547,008

@gituser
Copy link

gituser commented May 8, 2021

After experimenting for the last week or so I can confirm that:

  • NVMe is required due to the insane amount of tx's and blocks. Memory / CPU stick with the recommended specs from the docs
  • Using the chaindata export provided by Binance has an extremely hard time catching up to the latest block. It goes way faster when doing a fast sync from scratch
  • The BSC geth fork contains a lot of inefficiencies if you compare it to the Ethereum geth clients. This might need some fixing by the devs in order to improve the sync times and stability going forward
  • Increasing the maxpeers to 2000 is the config.toml helps as well as updating the bootstrapnodes and staticnodes. You can set in addition the --cache flag when starting geth to use the maximum amount of your systems memory

Hope this helps, syncing mainnet took here <48 hours and testnet <4 hours. Server is located in the EU.

EDIT: Scrolled through some other issues and people are curious how large a fast mainnet sync is on disk. Approx 191GB with the new v1.0.7-hf.1 geth version.

This is indeed the correct way to sync at the moment (don't use snapshots!), if your node is stuck syncing from the snapshot, stop the node, remove the node db and then sync from scratch with fast syncing.

Also try branch upgrade_1.10.2 there is a newer version 1.1.0 (which is based on newer geth 1.10) and it seems to be working fine, you just need to comment few things from the config.toml, like GraphQLPort.

My bsc v1.1.0 instance with 12GB cache and on NVME got synced this way in about ~9 hrs the whole chaindata occupies now only 188 GB instead of the 720 GB I had before and sync always was stuck behind the blockchain for about 5-7K blocks.

@ghost
Copy link
Author

ghost commented May 8, 2021

may i ask how exacly --cache flag should look like if I wanna give my node 16GB of cache?

@gituser
Copy link

gituser commented May 8, 2021

@edgeofthegame

here is the config.toml I've used for the node that got synced in ~ 9hrs:

config.toml:
[Eth]
NetworkId = 56
SyncMode = "fast"
NoPruning = false
NoPrefetch = false
LightPeers = 1
UltraLightFraction = 75
TrieCleanCache = 256
TrieDirtyCache = 256
TrieTimeout = 500000000000
#TrieTimeout = 3600000000000
EnablePreimageRecording = false
EWASMInterpreter = ""
EVMInterpreter = ""
DatabaseCache = 12000

[Eth.Miner]
GasFloor = 30000000
GasCeil = 40000000
GasPrice = 1000000000
Recommit = 10000000000
Noverify = false

[Eth.TxPool]
Locals = []
NoLocals = true
Journal = "transactions.rlp"
Rejournal = 3600000000000
PriceLimit = 1000000000
PriceBump = 10
AccountSlots = 512
GlobalSlots = 10000
AccountQueue = 256
GlobalQueue = 5000
Lifetime = 10800000000000

#[Eth.GPO]
#Blocks = 20
#Percentile = 60

[Node]
IPCPath = "geth.ipc"
HTTPHost = "127.0.0.1"
NoUSB = true
InsecureUnlockAllowed = false
HTTPPort = 8575
HTTPVirtualHosts = ["localhost"]
HTTPModules = ["eth", "net", "web3", "txpool", "parlia"]
WSPort = 8546
WSModules = ["net", "web3", "eth"]
#GraphQLPort = 8557
GraphQLVirtualHosts = ["*"]

[Node.P2P]
MaxPeers = 1000
NoDiscovery = false
BootstrapNodes = ["enode://1cc4534b14cfe351ab740a1418ab944a234ca2f702915eadb7e558a02010cb7c5a8c295a3b56bcefa7701c07752acd5539cb13df2aab8ae2d98934d712611443@52.71.43.172:30311","enode://28b1d16562dac280dacaaf45d54516b85bc6c994252a9825c5cc4e080d3e53446d05f63ba495ea7d44d6c316b54cd92b245c5c328c37da24605c4a93a0d099c4@34.246.65.14:30311","enode://5a7b996048d1b0a07683a949662c87c09b55247ce774aeee10bb886892e586e3c604564393292e38ef43c023ee9981e1f8b335766ec4f0f256e57f8640b079d5@35.73.137.11:30311"]
StaticNodes = ["enode://f3cfd69f2808ef64838abd8786342c0b22fdd28268703c8d6812e26e109f9a7cb2b37bd49724ebb46c233289f22da82991c87345eb9a2dadeddb8f37eeb259ac@18.180.28.21:30311","enode://ae74385270d4afeb953561603fcedc4a0e755a241ffdea31c3f751dc8be5bf29c03bf46e3051d1c8d997c45479a92632020c9a84b96dcb63b2259ec09b4fde38@54.178.30.104:30311","enode://d1cabe083d5fc1da9b510889188f06dab891935294e4569df759fc2c4d684b3b4982051b84a9a078512202ad947f9240adc5b6abea5320fb9a736d2f6751c52e@54.238.28.14:30311","enode://f420209bac5324326c116d38d83edfa2256c4101a27cd3e7f9b8287dc8526900f4137e915df6806986b28bc79b1e66679b544a1c515a95ede86f4d809bd65dab@54.178.62.117:30311","enode://c0e8d1abd27c3c13ca879e16f34c12ffee936a7e5d7b7fb6f1af5cc75c6fad704e5667c7bbf7826fcb200d22b9bf86395271b0f76c21e63ad9a388ed548d4c90@54.65.247.12:30311","enode://f1b49b1cf536e36f9a56730f7a0ece899e5efb344eec2fdca3a335465bc4f619b98121f4a5032a1218fa8b69a5488d1ec48afe2abda073280beec296b104db31@13.114.199.41:30311","enode://4924583cfb262b6e333969c86eab8da009b3f7d165cc9ad326914f576c575741e71dc6e64a830e833c25e8c45b906364e58e70cdf043651fd583082ea7db5e3b@18.180.17.171:30311","enode://4d041250eb4f05ab55af184a01aed1a71d241a94a03a5b86f4e32659e1ab1e144be919890682d4afb5e7afd837146ce584d61a38837553d95a7de1f28ea4513a@54.178.99.222:30311","enode://b5772a14fdaeebf4c1924e73c923bdf11c35240a6da7b9e5ec0e6cbb95e78327690b90e8ab0ea5270debc8834454b98eca34cc2a19817f5972498648a6959a3a@54.170.158.102:30311","enode://f329176b187cec87b327f82e78b6ece3102a0f7c89b92a5312e1674062c6e89f785f55fb1b167e369d71c66b0548994c6035c6d85849eccb434d4d9e0c489cdd@34.253.94.130:30311","enode://cbfd1219940d4e312ad94108e7fa3bc34c4c22081d6f334a2e7b36bb28928b56879924cf0353ad85fa5b2f3d5033bbe8ad5371feae9c2088214184be301ed658@54.75.11.3:30311","enode://c64b0a0c619c03c220ea0d7cac754931f967665f9e148b92d2e46761ad9180f5eb5aaef48dfc230d8db8f8c16d2265a3d5407b06bedcd5f0f5a22c2f51c2e69f@54.216.208.163:30311","enode://352a361a9240d4d23bb6fab19cc6dc5a5fc6921abf19de65afe13f1802780aecd67c8c09d8c89043ff86947f171d98ab06906ef616d58e718067e02abea0dda9@79.125.105.65:30311","enode://bb683ef5d03db7d945d6f84b88e5b98920b70aecc22abed8c00d6db621f784e4280e5813d12694c7a091543064456ad9789980766f3f1feb38906cf7255c33d6@54.195.127.237:30311","enode://11dc6fea50630b68a9289055d6b0fb0e22fb5048a3f4e4efd741a7ab09dd79e78d383efc052089e516f0a0f3eacdd5d3ffbe5279b36ecc42ad7cd1f2767fdbdb@46.137.182.25:30311","enode://21530e423b42aed17d7eef67882ebb23357db4f8b10c94d4c71191f52955d97dc13eec03cfeff0fe3a1c89c955e81a6970c09689d21ecbec2142b26b7e759c45@54.216.119.18:30311","enode://d61a31410c365e7fcd50e24d56a77d2d9741d4a57b295cc5070189ad90d0ec749d113b4b0432c6d795eb36597efce88d12ca45e645ec51b3a2144e1c1c41b66a@34.204.129.242:30311","enode://bb91215b1d77c892897048dd58f709f02aacb5355aa8f50f00b67c879c3dffd7eef5b5a152ac46cdfb255295bec4d06701a8032456703c6b604a4686d388ea8f@75.101.197.198:30311","enode://786acbdf5a3cf91b99047a0fd8305e11e54d96ea3a72b1527050d3d6f8c9fc0278ff9ef56f3e56b3b70a283d97c309065506ea2fc3eb9b62477fd014a3ec1a96@107.23.90.162:30311","enode://4653bc7c235c3480968e5e81d91123bc67626f35c207ae4acab89347db675a627784c5982431300c02f547a7d33558718f7795e848d547a327abb111eac73636@54.144.170.236:30311","enode://c6ffd994c4ef130f90f8ee2fc08c1b0f02a6e9b12152092bf5a03dd7af9fd33597d4b2e2000a271cc0648d5e55242aeadd6d5061bb2e596372655ba0722cc704@54.147.151.108:30311","enode://99b07e9dc5f204263b87243146743399b2bd60c98f68d1239a3461d09087e6c417e40f1106fa606ccf54159feabdddb4e7f367559b349a6511e66e525de4906e@54.81.225.170:30311","enode://1479af5ea7bda822e8747d0b967309bced22cad5083b93bc6f4e1d7da7be067cd8495dc4c5a71579f2da8d9068f0c43ad6933d2b335a545b4ae49a846122b261@52.7.247.132:30311"]
ListenAddr = ":30311"
EnableMsgEvents = false

[Node.HTTPTimeouts]
ReadTimeout = 30000000000
WriteTimeout = 30000000000
IdleTimeout = 120000000000

[Node.LogConfig]
FilePath = "bsc.log"
MaxBytesSize = 104857600
Level = "info"
FileRoot = ""

The config directive for cache is:
DatabaseCache = 12000

NOTE: geth actually takes bit more memory than what you specify in --cache or Database, e.g. it takes about 17GB right now with that setting for me, so make sure your VM has more memory or there is additional swap.

@a04512
Copy link

a04512 commented May 8, 2021

@edgeofthegame

here is the config.toml I've used for the node that got synced in ~ 9hrs:

config.toml:

[Eth]
NetworkId = 56
SyncMode = "fast"
NoPruning = false
NoPrefetch = false
LightPeers = 1
UltraLightFraction = 75
TrieCleanCache = 256
TrieDirtyCache = 256
TrieTimeout = 500000000000
#TrieTimeout = 3600000000000
EnablePreimageRecording = false
EWASMInterpreter = ""
EVMInterpreter = ""
DatabaseCache = 12000

[Eth.Miner]
GasFloor = 30000000
GasCeil = 40000000
GasPrice = 1000000000
Recommit = 10000000000
Noverify = false

[Eth.TxPool]
Locals = []
NoLocals = true
Journal = "transactions.rlp"
Rejournal = 3600000000000
PriceLimit = 1000000000
PriceBump = 10
AccountSlots = 512
GlobalSlots = 10000
AccountQueue = 256
GlobalQueue = 5000
Lifetime = 10800000000000

#[Eth.GPO]
#Blocks = 20
#Percentile = 60

[Node]
IPCPath = "geth.ipc"
HTTPHost = "127.0.0.1"
NoUSB = true
InsecureUnlockAllowed = false
HTTPPort = 8575
HTTPVirtualHosts = ["localhost"]
HTTPModules = ["eth", "net", "web3", "txpool", "parlia"]
WSPort = 8546
WSModules = ["net", "web3", "eth"]
#GraphQLPort = 8557
GraphQLVirtualHosts = ["*"]

[Node.P2P]
MaxPeers = 1000
NoDiscovery = false
BootstrapNodes = ["enode://1cc4534b14cfe351ab740a1418ab944a234ca2f702915eadb7e558a02010cb7c5a8c295a3b56bcefa7701c07752acd5539cb13df2aab8ae2d98934d712611443@52.71.43.172:30311","enode://28b1d16562dac280dacaaf45d54516b85bc6c994252a9825c5cc4e080d3e53446d05f63ba495ea7d44d6c316b54cd92b245c5c328c37da24605c4a93a0d099c4@34.246.65.14:30311","enode://5a7b996048d1b0a07683a949662c87c09b55247ce774aeee10bb886892e586e3c604564393292e38ef43c023ee9981e1f8b335766ec4f0f256e57f8640b079d5@35.73.137.11:30311"]
StaticNodes = ["enode://f3cfd69f2808ef64838abd8786342c0b22fdd28268703c8d6812e26e109f9a7cb2b37bd49724ebb46c233289f22da82991c87345eb9a2dadeddb8f37eeb259ac@18.180.28.21:30311","enode://ae74385270d4afeb953561603fcedc4a0e755a241ffdea31c3f751dc8be5bf29c03bf46e3051d1c8d997c45479a92632020c9a84b96dcb63b2259ec09b4fde38@54.178.30.104:30311","enode://d1cabe083d5fc1da9b510889188f06dab891935294e4569df759fc2c4d684b3b4982051b84a9a078512202ad947f9240adc5b6abea5320fb9a736d2f6751c52e@54.238.28.14:30311","enode://f420209bac5324326c116d38d83edfa2256c4101a27cd3e7f9b8287dc8526900f4137e915df6806986b28bc79b1e66679b544a1c515a95ede86f4d809bd65dab@54.178.62.117:30311","enode://c0e8d1abd27c3c13ca879e16f34c12ffee936a7e5d7b7fb6f1af5cc75c6fad704e5667c7bbf7826fcb200d22b9bf86395271b0f76c21e63ad9a388ed548d4c90@54.65.247.12:30311","enode://f1b49b1cf536e36f9a56730f7a0ece899e5efb344eec2fdca3a335465bc4f619b98121f4a5032a1218fa8b69a5488d1ec48afe2abda073280beec296b104db31@13.114.199.41:30311","enode://4924583cfb262b6e333969c86eab8da009b3f7d165cc9ad326914f576c575741e71dc6e64a830e833c25e8c45b906364e58e70cdf043651fd583082ea7db5e3b@18.180.17.171:30311","enode://4d041250eb4f05ab55af184a01aed1a71d241a94a03a5b86f4e32659e1ab1e144be919890682d4afb5e7afd837146ce584d61a38837553d95a7de1f28ea4513a@54.178.99.222:30311","enode://b5772a14fdaeebf4c1924e73c923bdf11c35240a6da7b9e5ec0e6cbb95e78327690b90e8ab0ea5270debc8834454b98eca34cc2a19817f5972498648a6959a3a@54.170.158.102:30311","enode://f329176b187cec87b327f82e78b6ece3102a0f7c89b92a5312e1674062c6e89f785f55fb1b167e369d71c66b0548994c6035c6d85849eccb434d4d9e0c489cdd@34.253.94.130:30311","enode://cbfd1219940d4e312ad94108e7fa3bc34c4c22081d6f334a2e7b36bb28928b56879924cf0353ad85fa5b2f3d5033bbe8ad5371feae9c2088214184be301ed658@54.75.11.3:30311","enode://c64b0a0c619c03c220ea0d7cac754931f967665f9e148b92d2e46761ad9180f5eb5aaef48dfc230d8db8f8c16d2265a3d5407b06bedcd5f0f5a22c2f51c2e69f@54.216.208.163:30311","enode://352a361a9240d4d23bb6fab19cc6dc5a5fc6921abf19de65afe13f1802780aecd67c8c09d8c89043ff86947f171d98ab06906ef616d58e718067e02abea0dda9@79.125.105.65:30311","enode://bb683ef5d03db7d945d6f84b88e5b98920b70aecc22abed8c00d6db621f784e4280e5813d12694c7a091543064456ad9789980766f3f1feb38906cf7255c33d6@54.195.127.237:30311","enode://11dc6fea50630b68a9289055d6b0fb0e22fb5048a3f4e4efd741a7ab09dd79e78d383efc052089e516f0a0f3eacdd5d3ffbe5279b36ecc42ad7cd1f2767fdbdb@46.137.182.25:30311","enode://21530e423b42aed17d7eef67882ebb23357db4f8b10c94d4c71191f52955d97dc13eec03cfeff0fe3a1c89c955e81a6970c09689d21ecbec2142b26b7e759c45@54.216.119.18:30311","enode://d61a31410c365e7fcd50e24d56a77d2d9741d4a57b295cc5070189ad90d0ec749d113b4b0432c6d795eb36597efce88d12ca45e645ec51b3a2144e1c1c41b66a@34.204.129.242:30311","enode://bb91215b1d77c892897048dd58f709f02aacb5355aa8f50f00b67c879c3dffd7eef5b5a152ac46cdfb255295bec4d06701a8032456703c6b604a4686d388ea8f@75.101.197.198:30311","enode://786acbdf5a3cf91b99047a0fd8305e11e54d96ea3a72b1527050d3d6f8c9fc0278ff9ef56f3e56b3b70a283d97c309065506ea2fc3eb9b62477fd014a3ec1a96@107.23.90.162:30311","enode://4653bc7c235c3480968e5e81d91123bc67626f35c207ae4acab89347db675a627784c5982431300c02f547a7d33558718f7795e848d547a327abb111eac73636@54.144.170.236:30311","enode://c6ffd994c4ef130f90f8ee2fc08c1b0f02a6e9b12152092bf5a03dd7af9fd33597d4b2e2000a271cc0648d5e55242aeadd6d5061bb2e596372655ba0722cc704@54.147.151.108:30311","enode://99b07e9dc5f204263b87243146743399b2bd60c98f68d1239a3461d09087e6c417e40f1106fa606ccf54159feabdddb4e7f367559b349a6511e66e525de4906e@54.81.225.170:30311","enode://1479af5ea7bda822e8747d0b967309bced22cad5083b93bc6f4e1d7da7be067cd8495dc4c5a71579f2da8d9068f0c43ad6933d2b335a545b4ae49a846122b261@52.7.247.132:30311"]
ListenAddr = ":30311"
EnableMsgEvents = false

[Node.HTTPTimeouts]
ReadTimeout = 30000000000
WriteTimeout = 30000000000
IdleTimeout = 120000000000

[Node.LogConfig]
FilePath = "bsc.log"
MaxBytesSize = 104857600
Level = "info"
FileRoot = ""

The config directive for cache is:
DatabaseCache = 12000

NOTE: geth actually takes bit more memory than what you specify in --cache or Database, e.g. it takes about 17GB right now with that setting for me, so make sure your VM has more memory or there is additional swap.

fast sync mode, 9 hours get full synced?

@koen84
Copy link

koen84 commented May 8, 2021

Would probably be good indeed, for the overall health of the network, if docs got updated and in particular clarify the demand on storage. If there's a significant amount of nodes with subpar specs, they could affect the nodes they peer with.

@gituser
Copy link

gituser commented May 8, 2021

@a04512 yes, fast sync from scratch in 9 hours, fully synced.

here is my HW specs: i9-9900K, 2xNVME 1TB in RAID1, mem: 24GB

although there are some other stuff running on the same machine, but bsc is giving it most I/O, CPU intensive load

@zcrypt0
Copy link

zcrypt0 commented May 8, 2021

CPU load seems to be a limiter too, i3.xlarge smallest AWS instance I've had success with.

Another thing, I noticed much better peering when setting up the AWS time sync service. One of my nodes went from no peers to enough to do a sync:

https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/set-time.html

@a04512
Copy link

a04512 commented May 9, 2021

@gituser how many states need to be fully synced , now i have 620M ,don't know how much really

@afanasy
Copy link
Contributor

afanasy commented May 9, 2021

@a04512 Number of state entries is different depending if you restarted node or not. @holiman explained why it is so here: ethereum/go-ethereum#14647 (comment) .

@bellsovery
Copy link

I'm using i3.xlarge with 1TG NVMe SSD. It's been already 7 days, but it keeps 50~100 blocks behind.

Please give me any advice

@afanasy
Copy link
Contributor

afanasy commented May 9, 2021

@bellsovery You need more CPU. AWS vCPUs are not real CPUs, they are threads on multicore CPUs. xlarge = 4 vCPU = 4 threads = 2 CPU cores. It worked for me on i3en.2xlarge - synced in about 10 hours (was a week ago).

@bellsovery
Copy link

@afanasy Thanks. I will try on i3en.2xlarge

@zcrypt0
Copy link

zcrypt0 commented May 9, 2021

@bellsovery I have an i3.xlarge and i3.2xlarge synced to the tip.
i3.2xlarge is faster and stays more consistently in sync, but the xlarge is working.

Make sure you are not using ext4 filesystem for the nvme mount. After I switched to xfs i had better perf.

@bellsovery
Copy link

@zcrypt0 Oh, really? I was using ext4 filesystem. Thanks for your advices!

@gobiyoga
Copy link

What peer counts are you guys getting? I cant seem to get my node above 18 but it seems to be syncing fast (full chain in 3.5 hours using fast sync) onto nvme drive

@easeev
Copy link

easeev commented May 10, 2021

What peer counts are you guys getting? I cant seem to get my node above 18 but it seems to be syncing fast (full chain in 3.5 hours using fast sync) onto nvme drive

From 17 to 64 on different nodes with --maxpeers 200

@bellsovery
Copy link

What peer counts are you guys getting? I cant seem to get my node above 18 but it seems to be syncing fast (full chain in 3.5 hours using fast sync) onto nvme drive

My node has 618 peers

@a04512
Copy link

a04512 commented May 10, 2021

@afanasy But it seems never end on bsc , mine is nearly 700M,ethereum have 12M blocks with 800M states to be fully synced

@afanasy
Copy link
Contributor

afanasy commented May 10, 2021

@a04512 It means your node is too slow and can't catch up, so it just keeps downloading state entries. You need to use better hardware (more CPU power, faster storage). On proper hardware BSC node syncs in fast sync mode (default mode) in about 10 hours, taking 170Gb of storage space, and downloading about 300M state entries (in one continuous run, without node restarts). Also make sure you are using bsc geth v1.0.7-hf.1 or higher, because otherwise it will start consuming a lot of space (after the sync finishes), see #190.

@afanasy
Copy link
Contributor

afanasy commented May 10, 2021

@gobiyoga 3.5 hours is a fantastic result, are you sure it is fully synced, with all state entries downloaded, not just blocks? And with correct genesis block (low peer count may indicate wrong genesis block)?

@john--
Copy link

john-- commented May 31, 2021

Hey guys. I learned the hard way that current drive is painfully too slow to run a full node in archive mode. I'm going to go out and buy an M.2 SSD for the little server in my basement. Any recommendations? I spoke to a guy a few weeks ago who offered me his AWS snapshot that was 2.5-3TB at the time, but I notice some people in this topic are mentioning only having 1 and 2TB drives, so how is that possible? Are they not running as archive?

I was thinking of a 4TB PNY NVMe SSD (M280CS2130-4TB-RB). Think that will be enough to get me going? As much as I'd like to future proof myself with 8TB, it's just too expensive for me at the moment.

@zhongfu
Copy link

zhongfu commented May 31, 2021

@john-- most here are running non-archive full nodes, which only take ~250GiB after a fresh sync

looking at some other issues on the issue tracker (#183), it seems like an archive node would use over 4TiB -- which is around 4.4TB. and that's as of approx. a month ago; disk usage grows by a terabyte every two weeks or so too, apparently

so it seems like a single 4TB SSD would most certainly be insufficient for running a BSC archive node, unfortunately

@john--
Copy link

john-- commented May 31, 2021

@john-- most here are running non-archive full nodes, which only take ~250GiB after a fresh sync

looking at some other issues on the issue tracker (#183), it seems like an archive node would use over 4TiB -- which is around 4.4TB. and that's as of approx. a month ago; disk usage grows by a terabyte every two weeks or so too, apparently

so it seems like a single 4TB SSD would most certainly be insufficient for running a BSC archive node, unfortunately

Ah, that explains things.
Sounds like the network is growing pretty fast. That's too bad. Thanks for the info!

@unsphere
Copy link

@unsphere looks like a VPS of some sort -- are you being guaranteed a baseline level of iops on the storage? is the SSD shared?

might be worth doing some i/o benchmarks to be sure that your storage is up to scratch

also, copy-on-write filesystems might make it hard for your node to catch up, although all i've got so far is anecdotal evidence

yeah I compared IOPS to a i3.xlarge instance and the vps that I got is significant slower. Thanks for the input, I contacted the provider regarding this. However eth.syncing is showing false now, but some minutes before it switched to false I got this result:

{
  currentBlock: 7878243,
  highestBlock: 7878249,
  knownStates: 348629847,
  pulledStates: 348629847,
  startingBlock: 7878223
}

I thought knownStates are at 800m+. Why is it synced then?

@zhongfu
Copy link

zhongfu commented May 31, 2021

@unsphere IIRC, a fast sync completes once geth has a) all the block headers/receipts/etc up until the pivot/currentBlock b) all the trie nodes that make up a state trie as of currentBlock, at which point geth will switch into full sync mode

If either of those conditions aren't met, geth will just keep downloading states (and it won't trim stale states!). So for example, if your node has slow I/O and isn't able to pull a complete state trie before the pivot moves, it'll just end up playing catch-up forever, and accumulate a bunch of stale trie nodes in the process.

@koen84
Copy link

koen84 commented Jun 7, 2021

Hey guys. I learned the hard way that current drive is painfully too slow to run a full node in archive mode. I'm going to go out and buy an M.2 SSD for the little server in my basement. Any recommendations? I spoke to a guy a few weeks ago who offered me his AWS snapshot that was 2.5-3TB at the time, but I notice some people in this topic are mentioning only having 1 and 2TB drives, so how is that possible? Are they not running as archive?

I was thinking of a 4TB PNY NVMe SSD (M280CS2130-4TB-RB). Think that will be enough to get me going? As much as I'd like to future proof myself with 8TB, it's just too expensive for me at the moment.

Archive node is currently 6.0 TiB. A 8 TB drive gives you 7 TiB effective storage.

@koen84
Copy link

koen84 commented Jun 7, 2021

also, copy-on-write filesystems might make it hard for your node to catch up, although all i've got so far is anecdotal evidence

I'm running my archive nodes on BTFRS (and use the snapshot function for backup increments), it works fine. But i'm definitely using NVMe's.

@affesq
Copy link

affesq commented Jun 11, 2021

Has anyone successfully spun up a bsc full node recently on any version of Mac OS? If so, what OS version / hardware specs? I've noticed nearly every post in this thread that specifies both their hardware and OS specs is running Linux.

I'm at about 24 elapsed hours with 420M known states, getting the classic ~100 blocks behind failure to full sync.

2tb nvme, 64GB RAM, 2.7ghz 12 cores, Geth version 1.1.0-beta-b67a129e-20210524
macOS Big Sur v11.4
txlookuplimit=0 ; snapshot=false ; cache 45000 ; rpc.allow-unprotected-txs ; syncmode fast
125mbps net speed
maxpeers set to 1000, console net.peerCount showing 250
edit: additional info: started sync from scratch, no snapshot
node being run on computer at my apartment; not a VPS or inside a datacenter

Am I just being impatient? Given my specs it seems like it should have been done in 10 or 14 hours at most. Seems like the common inference that over 350M states = inadequate hardware? But my hardware seems more than adequate and activity monitor shows its not anywhere near full capacity. Netspeed isn't spectacular but I've seen plenty of users posting full successful syncs with 100mbps.

Any insight appreciated....I can provide more info for diagnostics if needed.

@saovr
Copy link

saovr commented Jun 12, 2021

Has anyone successfully spun up a bsc full node recently on any version of Mac OS? If so, what OS version / hardware specs? I've noticed nearly every post in this thread that specifies both their hardware and OS specs is running Linux.

I'm at about 24 elapsed hours with 420M known states, getting the classic ~100 blocks behind failure to full sync.

2tb nvme, 64GB RAM, 2.7ghz 12 cores, Geth version 1.1.0-beta-b67a129e-20210524
macOS Big Sur v11.4
txlookuplimit=0 ; snapshot=false ; cache 45000 ; rpc.allow-unprotected-txs ; syncmode fast
125mbps net speed
maxpeers set to 1000, console net.peerCount showing 250
edit: additional info: started sync from scratch, no snapshot
node being run on computer at my apartment; not a VPS or inside a datacenter

Am I just being impatient? Given my specs it seems like it should have been done in 10 or 14 hours at most. Seems like the common inference that over 350M states = inadequate hardware? But my hardware seems more than adequate and activity monitor shows its not anywhere near full capacity. Netspeed isn't spectacular but I've seen plenty of users posting full successful syncs with 100mbps.

Any insight appreciated....I can provide more info for diagnostics if needed.

Yes, i did like 7 nodes until now, you can find my guide on multisniperbot.com , and i have a telegram group where i can answer all your questions if you need. Also a full node sync, the last i did was 2 days ago and it took like 8-9 hours max

@affesq
Copy link

affesq commented Jun 12, 2021

I did a removedb and dumpgenesis and started over from scratch. Synced in about 10 hours. Not sure what went wrong first time.

@muyinliu
Copy link

muyinliu commented Jul 4, 2021

IMPORTANT NOTE: native NVMe SSD is required, AWS EBS gp3/gp2/io2/io1 all fail to complete the sync(just keep logging Pivot became stale and never catch up the highest block after fall behind about 100 blocks)

More detail see my comment in #258

@pl7ofit
Copy link

pl7ofit commented Jul 6, 2021

Hello all, i was completely sync node on t3.2xlarge instance and 2TB gp3 ebs with IOPS 6000, fs is btrfs with zstd compression. Node can completed synced only for snap sync mode, all other modes node was not get syncing(i was check all performance instance and ebs)
Before node run, i unpacked on-fly fresh snapshot from https://github.com/binance-chain/bsc-snapshots with:
wget -qO - "http://snapshot_url" | dd status=progress | bsdtar -xf-
Start command:
geth --verbosity 5 --config config.toml --syncmode snap --gcmode archive --nousb --cache=26000 --maxpeers=4000 --http --http.addr 0.0.0.0 --http.port 8545 --http.corsdomain "*" --http.vhosts "*" --http.api "rpc, debug, admin, eth, net, web3, personal, txpool" --ws --ws.addr 0.0.0.0 --ws.port 8546 --ws.origins "*" --ws.api "rpc, debug, admin, eth, net, web3, personal, txpool"
It all took about three days.
But, i still have trouble - node make much i/o requests, blockchain directory grow up everyday +~10 GB. It is normal?

@zhongfu
Copy link

zhongfu commented Jul 6, 2021

@pl7ofit a few issues I can see:

  • you're trying to run an archive node with only 2TB of disk -- you'd probably need >6TiB (or ~6.6TB) for one now, and this will only increase
  • gcmode archive + syncmode snap will not get you an archive node, perhaps unless you find an archive node also keeps state snapshots around (but why?)
  • downloading a chaindata snapshot and extracting it is not a "snap sync"
  • downloading a non-archive chaindata snapshot and starting from that will not get you an archive node
  • on btrfs w/ compression: https://btrfs.wiki.kernel.org/index.php/Compression#Are_there_speed_penalties_when_doing_random_access_to_a_compressed_file.3F i.e. you're probably wrecking random I/O perf
  • EBS io2 gives you "single-digit millisecond" latency, which is absolutely abysmal -- I expect that gp3 will be similar, or worse. with that said, people have managed to do a fast sync from scratch with EBS storage. in any case, consider using instance-local/ephemeral storage

@mmotiy
Copy link

mmotiy commented Jul 26, 2021

I used nvme to synchronize successfully, and the number of states after synchronization is

{
  currentBlock: 9483112,
  highestBlock: 9483196,
  knownStates: 545814561,
  pulledStates: 545814561,
  startingBlock: 9483040
}

@Tronglx
Copy link

Tronglx commented Jul 26, 2021

Have any updates about this issue? I'm using SSD nvme 1TB and it cannot catching up. It's about 100 blocks behind for a long time.

> eth.syncing
{
  currentBlock: 9485996,
  highestBlock: 9486091,
  knownStates: 803281678,
  pulledStates: 803064102,
  startingBlock: 9483299
}

@n8twj
Copy link

n8twj commented Jul 26, 2021 via email

@Tronglx
Copy link

Tronglx commented Jul 26, 2021

I don't see the specs about disk speed on official docs. The result after running some commands:

> sudo dd if=/dev/nvme1n1 of=/tmp/output bs=8k count=10k
10240+0 records in
10240+0 records out
83886080 bytes (84 MB, 80 MiB) copied, 0.0872977 s, 961 MB/s
> sudo hdparm -Tt /dev/nvme1n1 /dev/nvme1n1:
 Timing cached reads:   39220 MB in  1.99 seconds = 19749.69 MB/sec
 HDIO_DRIVE_CMD(identify) failed: Inappropriate ioctl for device
 Timing buffered disk reads: 4844 MB in  3.00 seconds = 1614.36 MB/sec

So, i need another SSD?

@n8twj
Copy link

n8twj commented Jul 26, 2021 via email

@Tronglx
Copy link

Tronglx commented Jul 26, 2021

Sorry for the wrong check. I have double check, the chain data is stored in /dev/md127 and the speed even is lower than above.

sudo dd if=/dev/md127 of=/tmp/output bs=8k count=10
10+0 records in
10+0 records out
81920 bytes (82 kB, 80 KiB) copied, 0.00115969 s, 70.6 MB/s

I'll probably need a better SSD. Can you help me check the speed requirements? Your running node speed? I have checked official docs but i can't find it anywhere.

@j75689
Copy link
Contributor

j75689 commented Jul 30, 2021

Hi all.
Thank you for your report.
We have received many report of a sync issue.
You can try the latest version. If you have any probleums, please feedback #338.
We will pay attention to the issue #338 for a long time, and if there are any updates, we will explain it on this issue.

Thanks.

@j75689 j75689 closed this as completed Jul 30, 2021
@Patrollia
Copy link

Ive recently purchased a pc with i9 10 cores @ 5.2ghz , 2tb ssd NVME and 64gb ram -

System will come packaged with Linux.

Is there anyone who would be interested in swapping some bnb to help me set up a BSC full node?

Im not a linux guy :/

Thanks

If you still can't do it, I can help you.

Email : patrollia@gmail.com

@StanK23
Copy link

StanK23 commented Sep 6, 2021

Hi. I have very fast nvme ssd (980 pro).
When I started sync it was downloading blocks and states very fast - around 2k states for tick and 8-10 ticks per second and sometimes even "pending" was at 0. But it was only when I downloading old blocks with age.
After it finished with old blocks import and started to importing 1 block at the time, for some reason state downloading speed drops significantly, now it's downloading for 384 states for a tick and 1-2 tick in second, pending always increasing until "pivot stale, moving", then goes to 4k and increasing again. It only happened exactly when all old block headers was imported. Sometimes it starting to import states with high speed again, but for a several seconds, when it's near 100k pending.
The strangest thing is that PC usage all low, it's not using all internet bandwidth, all cores near 0%, 8gb memory used, but I have 16GB.
Can someone please explain me what is wrong? It's not using all resources.
I tried downloading snapshot - same result (What sync mode should I use for snapshot?)

@zhongfu
Copy link

zhongfu commented Sep 6, 2021

@StanK23

Can someone please explain me what is wrong? It's not using all resources.

not sure -- could be just that you can't download the trie nodes quickly enough, or your peers can't serve the trie nodes quickly enough. (this is very i/o intensive, so it's not too surprising)

I tried downloading snapshot - same result (What sync mode should I use for snapshot?)

your node should not be syncing states if you're using a chaindata snapshot. instead, it should be continuing a full sync from the last block in the snapshot (i.e. Imported chain segment in logs)

if you're seeing it download state trie entries, then you probably have done something wrong

@StanK23
Copy link

StanK23 commented Sep 6, 2021

not sure -- could be just that you can't download the trie nodes quickly enough, or your peers can't serve the trie nodes quickly enough. (this is very i/o intensive, so it's not too surprising)

It's not trying to, my ssd have like 1m iops, and 500mbps internet. Where I can find faster peers?

your node should not be syncing states if you're using a chaindata snapshot. instead, it should be continuing a full sync from the last block in the snapshot (i.e. Imported chain segment in logs)

Thanks for that, so I must set syncmode to "full" if I using snapshot?

@zhongfu
Copy link

zhongfu commented Sep 7, 2021

It's not trying to, my ssd have like 1m iops, and 500mbps internet. Where I can find faster peers?

dunno, it's not like you can easily figure out which nodes have a ridiculously high amount of random IOPS anyway

Thanks for that, so I must set syncmode to "full" if I using snapshot?

no need, geth should recognize that the chaindata you're using was fully synced (ie caught up to chain head) some time in the past, and drop into full sync mode regardless of the syncmode you've configured

@Aziz87
Copy link

Aziz87 commented Dec 5, 2021

изображение

изображение

If you not see that - you need remove your node folder and settings and repeat all again with doc [](https://docs.binance.org/smart-chain/developer/fullnode.html)

!!Attention!
Not use Sync From Genesis Block (Not Recommended)
Use Sync From Snapshot (Recommended)

If no enught memory on your nvme (for example, you have 2tb, but snapshot size is 1.1tb)- download old snapshots with small size. (800gb, 700gb.. or old... see github commits history)

not edit default config! just download mainnet.zip and use it as default with launch command provided on binance docs
./geth_linux --config ./config.toml --datadir ./node --diffsync --cache 8000 --rpc.allow-unprotected-txs --txlookuplimit 0

изображение

@zhongfu
Copy link

zhongfu commented Dec 5, 2021

If no enught memory on your nvme (for example, you have 2tb, but snapshot size is 1.1tb)- download old snapshots with small size. (800gb, 700gb.. or old... see github commits history)

(i assume you meant 1TB instead of "2tb")

i would highly recommend that you don't try to run it with an SSD smaller than 1.5 to 2TB -- yes, you could download a smaller snapshot from the past, but:

  • it'll take a really long time for you to catch up from an old snapshot
  • while catching up, the 100GiB or so of free space on your disk will be filled up really quickly
  • you'll also need a decent amount of free space to prune state -- maybe 40GiB to be safe? so you'll keep having to shut your node down for a few hours to prune
  • and even if you do manage to catch up, there will very soon be enough new state such that even a freshly pruned node will take up more space than is available on a 1TB SSD

so you should save yourself the trouble, and just spend the extra $100 or so on a 2TB SSD (or even better, a larger SSD or multiple 2TB SSDs)

in any case -- you'd be better off starting your sync with the most recent snapshot; catching up with and executing 1-2 months of blocks is not going to be fun. you can always prune the chaindata afterwards anyway

@banshee
Copy link

banshee commented Mar 13, 2022

Also, it's a bit more complicated right now because the snapshot given on the website (here - https://docs.binance.org/smart-chain/validator/snapshot.html) is broken. Looks like the AWS security settings got messed up.

Connecting to s3.ap-northeast-1.amazonaws.com (s3.ap-northeast-1.amazonaws.com)|52.219.68.136|:443... connected.
HTTP request sent, awaiting response... 403 Forbidden
2022-03-13 11:30:41 ERROR 403: Forbidden.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests