-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slowly handling large number of files #3528
Comments
% ./ipfs repo stat |
I encounter this too. |
Hey @FortisFortuna, yes, this is a common issue with the default |
thank you I have about 18 GB and 500K files (Everipedia) on the default flatfs. Do these commands convert the blocks from flatfs to badgerds so I don't have to do everything over again? |
Yes. However, it may be faster to do it over as this will still have to extract the blocks from flatfs and move them into badgerds. |
Yes, keep in mind the conversion tool will require twice the size of the repo being converted. |
Also, I'd be interested in how big your datastore gets with badgerds. |
I am unable to build the conversion tool. It stalls for me on the |
Looks like you're having trouble fetching the dependencies. Try building ipfs-inactive/ipfs-ds-convert#11. |
Ok thanks! The pull request you did let me build it. I will follow the instructions in this thread and #5013 now and try to convert the db (I backed up the flatfs version just in case). Thanks for the quick reply. |
Works, but still the same slow speed |
@FortisFortuna That's strange, I would definitely expect a speed up when using Badger instead of the flat datastore when adding files, I mean I can't say that it would be fast, but it should be noticeably faster than you're previous setup. Could you, as a test, initialize a new repo with the |
Hm. Actually, this may be pins. Are you adding one large directory or a bunch of individual files? Our pin logic is really optimized at the moment so if you add all the files individually, you'll end up with many pins and performance will be terrible. |
Everipedia has around 6 million pages, and I have IPFS'd about 710K of them in the past week on a 32 core 252G RAM machine. Something is bottlenecking because I am only getting about 5-10 hashes a second. I know for a fact the bottleneck is the
Specifically, a gzipped html file of average size ~15 kB is being added each loop. |
Ah. Yeah, that'd do it. We're trying to redesign how we do pins but that's currently under discussion. So, the best way to deal with this is to just add the files all at once with |
i will try pin=False. I need to keep track of which files get which hashes through, so I don't think I can simple pre-generate the html files, then add, unless you know a way. |
Once you've added a directory, you can get the hashes of the files in the directory by running either:
Note: If you're adding a massive directory, you'll need to enable (directory sharding](https://github.com/ipfs/go-ipfs/blob/master/docs/experimental-features.md#directory-sharding--hamt) (which is an experimental feature). |
Thanks |
So to clarify, if I put |
You are a god among men @Stebalien. Setting pin=False int the Python script did it! To summarize
Getting like 25 hashes a second now vs 3-5 before. |
Hey guys. The IPFS server was working fine with the above helper options, until I needed to restart. When I did, the daemon tries to initialize, but freezes. I am attempting to update from 0.4.15 to 0.4.17 to see if that helps, but now it stalls on "applying 6-to-7 repo migration". |
I see this in the processes |
Ok, so the migration did eventually finish, but it took a while (~ 1 hr). Once the update went through, the daemon started fast. It is working now. |
So, that migration should have been blazing fast. It may have been that "daemon freeze". That is, the migration literally uses the 0.4.15 repo code to load the repo for the migration. It may also have been the initial repo size computation. We've switched to memoizing the repo size as it's expensive to compute for large repos but we have to compute it up-front so that might have delayed your startup. |
ipfs add -r 289GB( average file size < 10MB) Do you means to speed up by (go-ipfs v0.4.18) Is this right? |
@hoogw please report a new issue. |
ipfs add (by default --pin = true) To turn off pin to speed up, D:\test>ipfs add --pin=false IMG_1427.jpg |
Version information:
% ./ipfs version --all
go-ipfs version: 0.4.4-
Repo version: 4
System version: amd64/freebsd
Golang version: go1.7
Type:
./ipfs add became very slow when handling large number of files.
Priority:P1
Description:
./ipfs add became very slow when handling about 45K files(~300GB), it took about 3+ seconds to wait after the process bar finished.
But we can run several IPFS in the same machine to deal with this issue.
About the machine:
CPU:E3-1230V2, RAM:16G, storage:8T with 240G SSD cache@ZFS
Thanks for the amazing project!
The text was updated successfully, but these errors were encountered: