-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
All BSC nodes are OFF SYNC #189
Comments
Not sure what there is to fix. The block size and TPS have both increased (exponentially for TPS), and hardware that was sufficient a month ago is no longer able to keep up. Push these other services to improve their resources. I had sync/lag issues with SSDs on multiple machines. I built a new node using NVMe (PCIe, not SATA mode) and have not had a single hiccup for the several days it's been running. I won't claim that there aren't optimizations that could be done, but the blockchain is IOPS-heavy, and you need hardware to support it. |
Well, if you need PCIe NVMe instead of SSD, it should be reflected in users manual at least. I have seen two different user manuals on official site, non of them said anything about NVMe. And I have already bought 3 SSD servers. |
In their defense, the manual was written long (long) before IOPS had been a limiting factor. But I definitely agree that the docs are a little stagnant as a whole. |
After experimenting for the last week or so I can confirm that:
Hope this helps, syncing mainnet took here <48 hours and testnet <4 hours. Server is located in the EU. EDIT: Scrolled through some other issues and people are curious how large a fast mainnet sync is on disk. Approx 191GB with the new v1.0.7-hf.1 geth version. |
Can confirm above, same results for us so far. Thanks for the summary @sjors-lemniscap |
eth.syncing is showing false since my node has been fully synced. Is there a way to show the PulledStates / KnownStates once a node is synced? Happy to show the output but I don't know the right command to retrieve this info. |
Ah okay. I don't think there is any command to check after full sync. Maybe someone else knows it. This is our output/amount of states right now: |
I'm at 317 million known states with a node size of 187.5 GB syncing in fast mode. Hopefully I will be done soon. |
mine 562,191,528 |
This is indeed the correct way to sync at the moment (don't use snapshots!), if your node is stuck syncing from the snapshot, stop the node, remove the node db and then sync from scratch with fast syncing. Also try branch My |
may i ask how exacly --cache flag should look like if I wanna give my node 16GB of cache? |
@edgeofthegame here is the config.toml I've used for the node that got synced in ~ 9hrs: config.toml:
The config directive for cache is: NOTE: geth actually takes bit more memory than what you specify in |
fast sync mode, 9 hours get full synced? |
Would probably be good indeed, for the overall health of the network, if docs got updated and in particular clarify the demand on storage. If there's a significant amount of nodes with subpar specs, they could affect the nodes they peer with. |
@a04512 yes, fast sync from scratch in 9 hours, fully synced. here is my HW specs: i9-9900K, 2xNVME 1TB in RAID1, mem: 24GB although there are some other stuff running on the same machine, but bsc is giving it most I/O, CPU intensive load |
CPU load seems to be a limiter too, i3.xlarge smallest AWS instance I've had success with. Another thing, I noticed much better peering when setting up the AWS time sync service. One of my nodes went from no peers to enough to do a sync: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/set-time.html |
@gituser how many states need to be fully synced , now i have 620M ,don't know how much really |
@a04512 Number of state entries is different depending if you restarted node or not. @holiman explained why it is so here: ethereum/go-ethereum#14647 (comment) . |
I'm using i3.xlarge with 1TG NVMe SSD. It's been already 7 days, but it keeps 50~100 blocks behind. Please give me any advice |
@bellsovery You need more CPU. AWS vCPUs are not real CPUs, they are threads on multicore CPUs. xlarge = 4 vCPU = 4 threads = 2 CPU cores. It worked for me on |
@afanasy Thanks. I will try on i3en.2xlarge |
@bellsovery I have an i3.xlarge and i3.2xlarge synced to the tip. Make sure you are not using ext4 filesystem for the nvme mount. After I switched to xfs i had better perf. |
@zcrypt0 Oh, really? I was using ext4 filesystem. Thanks for your advices! |
What peer counts are you guys getting? I cant seem to get my node above 18 but it seems to be syncing fast (full chain in 3.5 hours using fast sync) onto nvme drive |
From 17 to 64 on different nodes with |
My node has 618 peers |
@afanasy But it seems never end on bsc , mine is nearly 700M,ethereum have 12M blocks with 800M states to be fully synced |
@a04512 It means your node is too slow and can't catch up, so it just keeps downloading state entries. You need to use better hardware (more CPU power, faster storage). On proper hardware BSC node syncs in fast sync mode (default mode) in about 10 hours, taking 170Gb of storage space, and downloading about 300M state entries (in one continuous run, without node restarts). Also make sure you are using bsc geth v1.0.7-hf.1 or higher, because otherwise it will start consuming a lot of space (after the sync finishes), see #190. |
@gobiyoga 3.5 hours is a fantastic result, are you sure it is fully synced, with all state entries downloaded, not just blocks? And with correct genesis block (low peer count may indicate wrong genesis block)? |
Hey guys. I learned the hard way that current drive is painfully too slow to run a full node in archive mode. I'm going to go out and buy an M.2 SSD for the little server in my basement. Any recommendations? I spoke to a guy a few weeks ago who offered me his AWS snapshot that was 2.5-3TB at the time, but I notice some people in this topic are mentioning only having 1 and 2TB drives, so how is that possible? Are they not running as archive? I was thinking of a 4TB PNY NVMe SSD (M280CS2130-4TB-RB). Think that will be enough to get me going? As much as I'd like to future proof myself with 8TB, it's just too expensive for me at the moment. |
@john-- most here are running non-archive full nodes, which only take ~250GiB after a fresh sync looking at some other issues on the issue tracker (#183), it seems like an archive node would use over 4TiB -- which is around 4.4TB. and that's as of approx. a month ago; disk usage grows by a terabyte every two weeks or so too, apparently so it seems like a single 4TB SSD would most certainly be insufficient for running a BSC archive node, unfortunately |
Ah, that explains things. |
yeah I compared IOPS to a i3.xlarge instance and the vps that I got is significant slower. Thanks for the input, I contacted the provider regarding this. However eth.syncing is showing false now, but some minutes before it switched to false I got this result:
I thought knownStates are at 800m+. Why is it synced then? |
@unsphere IIRC, a fast sync completes once geth has a) all the block headers/receipts/etc up until the pivot/currentBlock b) all the trie nodes that make up a state trie as of currentBlock, at which point geth will switch into full sync mode If either of those conditions aren't met, geth will just keep downloading states (and it won't trim stale states!). So for example, if your node has slow I/O and isn't able to pull a complete state trie before the pivot moves, it'll just end up playing catch-up forever, and accumulate a bunch of stale trie nodes in the process. |
Archive node is currently 6.0 TiB. A 8 TB drive gives you 7 TiB effective storage. |
I'm running my archive nodes on BTFRS (and use the snapshot function for backup increments), it works fine. But i'm definitely using NVMe's. |
Has anyone successfully spun up a bsc full node recently on any version of Mac OS? If so, what OS version / hardware specs? I've noticed nearly every post in this thread that specifies both their hardware and OS specs is running Linux. I'm at about 24 elapsed hours with 420M known states, getting the classic ~100 blocks behind failure to full sync. 2tb nvme, 64GB RAM, 2.7ghz 12 cores, Geth version 1.1.0-beta-b67a129e-20210524 Am I just being impatient? Given my specs it seems like it should have been done in 10 or 14 hours at most. Seems like the common inference that over 350M states = inadequate hardware? But my hardware seems more than adequate and activity monitor shows its not anywhere near full capacity. Netspeed isn't spectacular but I've seen plenty of users posting full successful syncs with 100mbps. Any insight appreciated....I can provide more info for diagnostics if needed. |
Yes, i did like 7 nodes until now, you can find my guide on multisniperbot.com , and i have a telegram group where i can answer all your questions if you need. Also a full node sync, the last i did was 2 days ago and it took like 8-9 hours max |
I did a removedb and dumpgenesis and started over from scratch. Synced in about 10 hours. Not sure what went wrong first time. |
IMPORTANT NOTE: native NVMe SSD is required, AWS EBS More detail see my comment in #258 |
Hello all, i was completely sync node on t3.2xlarge instance and 2TB gp3 ebs with IOPS 6000, fs is btrfs with zstd compression. Node can completed synced only for snap sync mode, all other modes node was not get syncing(i was check all performance instance and ebs) |
@pl7ofit a few issues I can see:
|
I used nvme to synchronize successfully, and the number of states after synchronization is
|
Have any updates about this issue? I'm using SSD nvme 1TB and it cannot catching up. It's about 100 blocks behind for a long time.
|
Then your drive is not a full speed NVMe…get a faster one.
On Mon, Jul 26, 2021 at 01:52 Tronglx ***@***.***> wrote:
Have any updates about this issue, i'm using SSD nvme 1TB and it cannot
catching up. It's about 100 blocks behind for long time.
> eth.syncing
{
currentBlock: 9485996,
highestBlock: 9486091,
knownStates: 803281678,
pulledStates: 803064102,
startingBlock: 9483299
}
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#189 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABGIZ4MHQNLX6RA2YSVHYK3TZTZ3RANCNFSM44HDQSPA>
.
--
Typos courtesy of my iPhone...
|
I don't see the specs about disk speed on official docs. The result after running some commands:
So, i need another SSD? |
nvme1 may not be full speed. I have only had success when using nvme0
…On Mon, Jul 26, 2021 at 3:03 AM Tronglx ***@***.***> wrote:
I don't see the specs about disk speed on official docs. The result after
running some commands:
> sudo dd if=/dev/nvme1n1 of=/tmp/output bs=8k count=10k; rm -f /tmp/output
10240+0 records in
10240+0 records out
83886080 bytes (84 MB, 80 MiB) copied, 0.0872977 s, 961 MB/s
> sudo hdparm -Tt /dev/nvme1n1 /dev/nvme1n1:
Timing cached reads: 39220 MB in 1.99 seconds = 19749.69 MB/sec
HDIO_DRIVE_CMD(identify) failed: Inappropriate ioctl for device
Timing buffered disk reads: 4844 MB in 3.00 seconds = 1614.36 MB/sec
So, i need another SSD?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#189 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABGIZ4NH433ZPVQPUY33V33TZUCDXANCNFSM44HDQSPA>
.
|
Sorry for the wrong check. I have double check, the chain data is stored in /dev/md127 and the speed even is lower than above.
I'll probably need a better SSD. Can you help me check the speed requirements? Your running node speed? I have checked official docs but i can't find it anywhere. |
If you still can't do it, I can help you. Email : patrollia@gmail.com |
Hi. I have very fast nvme ssd (980 pro). |
not sure -- could be just that you can't download the trie nodes quickly enough, or your peers can't serve the trie nodes quickly enough. (this is very i/o intensive, so it's not too surprising)
your node should not be syncing states if you're using a chaindata snapshot. instead, it should be continuing a full sync from the last block in the snapshot (i.e. if you're seeing it download state trie entries, then you probably have done something wrong |
It's not trying to, my ssd have like 1m iops, and 500mbps internet. Where I can find faster peers?
Thanks for that, so I must set syncmode to "full" if I using snapshot? |
dunno, it's not like you can easily figure out which nodes have a ridiculously high amount of random IOPS anyway
no need, geth should recognize that the chaindata you're using was fully synced (ie caught up to chain head) some time in the past, and drop into full sync mode regardless of the syncmode you've configured |
If you not see that - you need remove your node folder and settings and repeat all again with doc
[](https://docs.binance.org/smart-chain/developer/fullnode.html)
!!Attention! If no enught memory on your nvme (for example, you have 2tb, but snapshot size is 1.1tb)- download old snapshots with small size. (800gb, 700gb.. or old... see github commits history) not edit default config! just download mainnet.zip and use it as default with launch command provided on binance docs |
(i assume you meant 1TB instead of "2tb") i would highly recommend that you don't try to run it with an SSD smaller than 1.5 to 2TB -- yes, you could download a smaller snapshot from the past, but:
so you should save yourself the trouble, and just spend the extra $100 or so on a 2TB SSD (or even better, a larger SSD or multiple 2TB SSDs) in any case -- you'd be better off starting your sync with the most recent snapshot; catching up with and executing 1-2 months of blocks is not going to be fun. you can always prune the chaindata afterwards anyway |
Also, it's a bit more complicated right now because the snapshot given on the website (here - https://docs.binance.org/smart-chain/validator/snapshot.html) is broken. Looks like the AWS security settings got messed up.
|
Well, I have tried to sync my own node and failed. It is syncing a week already. OK, so I decided to buy access to a node in the internet.
I have tried ankr, getblock and quiknode so far, and they ALL are OFF SYNC!!!
Please don't tell me anything about my hardware is weak or I did something wrong.
Just figure out what is going on, and fix it. A month ago everything was alright.
The text was updated successfully, but these errors were encountered: