-
Notifications
You must be signed in to change notification settings - Fork 24
cdnjs #35
Comments
cc @lgierth |
On hold until |
Ok well I bit the bullet and added this to IPFS. Here is the result: http://ipfs.io/ipfs/QmZJKsLpebYqHApRLeoLhj2NsXZ2JoXqhSWkxafxaBXYu7 A few observations. Note that for all of the below comments, the daemon was not running, and I was IPFS version
This means that is took 6 hours to call
I took a look at every single hash to see what ones were duplicated. In total, about 180000 difference hashes were dedups at least once in this tree. The most dedupd hash was a small 1x1 PNG file with 12376 links. The second most deduped file is an empty file with 696 links. I've uploaded a file that lists all dedups. I haven't done the analysis to see if it would account for the savings reported by
|
@eminence awesome, thanks for tackling this and writing up the details :) |
Second day follow up notes:
But IPNS is still too slow. Or rather, it is not consistently fast. I wrote a script that requests the same file several times, and records how long it takes. The data for 60 requests is here. The summery is that most of requests are quick (about 0.5 seconds or less). But some requests take 10 seconds, others up to 60 seconds. When your website is loading many different assets (like cdnjs libs), this can result is very noticeable delays.
|
I think
Probably, yes. |
|
yes, very much so. |
yes, ipns is still very slow. there's caching (the fast results), but we need to fix this at the dht query level |
After testing with 0.4.0, the experience was much better!
If anyone is running a 0.4.0 node, the result is here: QmRnvPSCNmYHdYQAo6JUWJPW8uVQv7z6D9nSQmw5qbHVWy |
Good news. Though still waaaay too slow for my liking. Adding concurrent add will help here. I believe we have not added this.
|
For me, the gold standard of merkledags is probably git :) So I timed how long it takes to add this directory tree into a new git repo -- 41 minutes! About an order of magnitude faster than IPFS. |
@eminence yeah, we've got a little ways to go still, but keep in mind that git uses a faster hashing algorithm, and doesnt chunk objects. I have one more changeset to apply that should get close to leveling the playing field. Just have to polish it a bit. |
In some cases, e.g. lots of 1MB files instead of 1KB files, ipfs is way faster than git ipfs/kubo#1973 (comment) |
The chunking equivalent in git would be |
we should consider moving the default to https://blake2.net/ it's designed for this. |
That would explain rsync's speed, since it uses blake2. Perhaps it is possible to go ~O(rsync) with this. It would be more effective, though, to use blake2 into 0.4.0 before its release (combining all the incompatible changes into one release?). |
blake2 isnt incompatible. https://github.com/jbenet/multihash :) |
oh i guess it is, because it's not included in the 0.3.x codebase, right. |
yeah it would be nice, but idk if we can land it in time. @whyrusleeping wants to ship 0.4.0 soon. would be nice to add blake2 and ipld support, but not sure if we'll get there in time |
Even though git is super fast here, ipfs isn't unusablely slow. as an end user, i have so many more things on my wishlist that are more important to me than speed (edit: but i do indeed understand the desire to put all breaking changes into 1 release) |
@rht What percentage of the total runtime is currently consumed by the hash function? |
...I did instead with testing dev0.4.0+blake2b. There is more cpu consumption and a speedup but not noticeable due to the jitter like in ipfs/kubo#2039 (comment). Perhaps it could be significant once dev0.4.0 goes to ~O(git) or ~O(rsync). Though I could have just checked the runtime percentage of the hashing. Currently, both ipfs and git are slow for adding large things. |
I re-archived it on my own. Available under fs:/ipns/cdnjs.ipfs.ovh Notes: It took about 6h, I had one crash during that time. |
Anyone interested in a deterministic, efficient node script that wraps the command-line to do the git pull / git diff / ipfs add / ipfs object patch manual work that does this into the existing index you last put into the IPFS network? Could also handle the memory issue (restarting the daemon or sets of potential offending processes when they get near a threshold). About four months ago I did this and I'll need to dig up my work sitting on one of my VPSes; simple dumb 'ipfs add -r' every time I got pulled was quite expensive. Might be reusable in other related archive targets (other big git repos that are more archive repositories than single codebases). Just requires 'git', 'ipfs' and 'node' on the path of the indexing machine. Maybe someone has already done this or perhaps the ipfs internals have improved enough to remedy the need for this? I just found out about this "IPFS Archives" project at the Decentralized Web Summit, pretty exciting. @whyrusleeping still doing the workshop on IPFS Archives and Versioning, so tagging him. |
Me and @magik6k are working on adding the cdnjs currently. I added whole cdnjs some time ago, it is available under fs:/ipns/cdnjs.ipfs.ovh but I think most of that data is currently gone (due to problems with local IPFS repo). Adding cdnjs is nice stress test for IPFS: ipfs/kubo#2823 ipfs/kubo#2828 |
Note about publishing the cdnjs: We shouldn't use |
Updated CDNJS hash: https://ipfs.io/ipfs/QmPJnEf5933cXteZmaMJkphCW1CtpcMMVx7N6rUr8cZAok #!/bin/bash
HASH=$(ipfs object new unixfs-dir)
for FILE in cdnjs/ajax/libs/*; do
LIB=$(basename $FILE)
echo "adding $LIB"
LIB_HASH=$(ipfs add -r -H -q "$FILE" | tail -n 1)
HASH=$(ipfs object patch $HASH add-link $LIB $LIB_HASH)
done
echo "final hash: $HASH" ...However, this doesn't handle symlinks, which do exist in CDNJS. I've gotta decide how to deal with those. |
What needs handling exactly (with regard to symlinks)? |
If you click on one right now (like https://ipfs.io/ipfs/QmPJnEf5933cXteZmaMJkphCW1CtpcMMVx7N6rUr8cZAok/zocial) it's broken, whereas https://cdnjs.cloudflare.com/ajax/libs/zocial/1.2.0/css/zocial.css works. |
Ahh. I see, yes. I would just pretend the symlink doesn't exist and just add the contents of the directory. Let IPFS's intrinsic de-dup handle it from there |
BTW, I am trying to pin this hash, but I can't download everything. Are you still seeding it, @slang800 ? |
@eminence you are better of, instead of pinning it right away, doing |
Actually, I am already doing that (using |
Or maybe I just can't connect to @slang800 node? When I run
So I'm a little confused about what's actually happening here |
It is possible that you can't connect with @slang800 node directly. Maybe his node failed to penetrate NAT and only connected to SolarNet nodes/can't accept connections, this would be why you can see things work via ipfs.io but not through your own instance. |
Sorry - I was hosting this on my desktop and turned it off this morning to move my desk to the opposite side of the room. It should be up now. :D |
@slang800 can you share your peerID (result of |
Sure, it's |
Submitted an updated build based on github.com/cdnjs/cdnjs commit 4fabd85c986d57a61e0fbd8504cf15d67f60ada6 here: #82 New hash would be: QmRrnfFUgx81KZR9ibEcxDXgevoj9e5DydB5v168yembnX - https://ipfs.io/ipfs/QmRrnfFUgx81KZR9ibEcxDXgevoj9e5DydB5v168yembnX It's stored at Pollux right now. |
@cdnjs maintainer here, anything I can help here? |
@PeterDaveHello Thanks for checking in here! I think something that would be really useful is adding "ipfs.io" as one of the CDN providers, however, I'm not sure if you have support for adding more, currently it's just Cloudflare there without the ability to change. If we did that, we would need to setup the updating/adding to be more automatic, right now it's me doing a manual
What you think? |
I'm afraid that we can't do that as we update the library and website every 5~10 mins since the 3k+ libraries update very frequently. This is all automatically, without manual review and merge. I also wonder if it's a good idea to push ipfs when I don't understand this project enough, when we're going to provide service officially, we'll have responsible on that, especially when there is anything wrong, so, sorry, that might not be something I can do right now. Maybe I can help update the files from my side if you want, currently, the files looks dated on ipfs. |
Yeah, understandable, and making it all automatic is a much better way to go from the get-go so makes sense. Something else we can do from our side is having the same interface you run on cdnjs.com but slightly modified to hook-up into our version of cdnjs, and deployed on cdnjs.ipfs.io or something like that. Would need to make sure it's always up-to-date, which will take some effort but not be super hard.
Yeah, as I mentioned, the process is right now manual but in reality, should be fully automated. Will have some more thoughts about this at a later point. Thanks for jumping in here and sharing your thoughts 👍 |
@victorbjelkholm thanks! Let me know if I can help update cdnjs on ipfs more up-to-update and frequently :) |
IPFS stores files much like git does, so updating it 'live' shouldn't really be a problem. This updating could be done using the The best way I can see it done in case of cdnjs is to have a tool that would apply updates based on which files changed in git commits. Only thing I'm not sure about is how IPNS would react to that frequency of updates. |
Yeah we can try to integrate that in our buildScript |
Would love to have a IPFS compatible fork of https://github.com/cdnjs/cdnjs serving files via IPFS. Super large repository though but will give it a try to develop the integration locally.
The text was updated successfully, but these errors were encountered: