Skip to content
This repository has been archived by the owner on Feb 12, 2024. It is now read-only.

Plans to add ls, object.stat, swarm.connect? #3252

Closed
bmann opened this issue Aug 28, 2020 · 29 comments
Closed

Plans to add ls, object.stat, swarm.connect? #3252

bmann opened this issue Aug 28, 2020 · 29 comments
Assignees
Labels
kind/discussion Topical discussion; usually not changes to codebase status/in-progress In progress

Comments

@bmann
Copy link

bmann commented Aug 28, 2020

We’re planning to use a shared IPFS worker on our webnative apps, so your data is available instantly across all those apps. We’ve added a HTML file and worker JS file to our auth lobby, so you can load that in using an iframe.

Staging version of that HTML file:
https://round-aquamarine-dinosaur.fission.app/ipfs.html

Ideally you won’t need to think about iframes at all, and the webnative sdk will handle that for you.

For this to work we would need a few extra methods on ipfs-message-port-client:

  • ls, listing directory contents.
  • object.stat, getting the size of directories and files.
  • swarm.connect, connecting to peers.

Are you planning on adding these methods?

Also posted to our forum https://talk.fission.codes/t/shared-ipfs-worker/996 cc @icidasset

@bmann bmann added the need/triage Needs initial labeling and prioritization label Aug 28, 2020
@welcome
Copy link

welcome bot commented Aug 28, 2020

Thank you for submitting your first issue to this repository! A maintainer will be here shortly to triage and review.
In the meantime, please double-check that you have provided all the necessary information to make this process easy! Any information that can help save additional round trips is useful! We currently aim to give initial feedback within two business days. If this does not happen, feel free to leave a comment.
Please keep an eye on how this issue will be labeled, as labels give an overview of priorities, assignments and additional actions requested by the maintainers:

  • "Priority" labels will show how urgent this is for the team.
  • "Status" labels will show if this is ready to be worked on, blocked, or in progress.
  • "Need" labels will indicate if additional input or analysis is required.

Finally, remember to use https://discuss.ipfs.io if you just need general support.

@achingbrain
Copy link
Member

Yes, we're planning on fleshing out the API so thanks for mentioning the methods you use, it helps us prioritise accordingly.

We're trying to retire the object API, have you considered using ipfs.files.stat to get the sizes of directories & files instead?

E.g.

const stats = await ipfs.files.stat('/ipfs/Qmfoo')

@icidasset
Copy link
Contributor

@achingbrain Thanks! We'll try out ipfs.files.stat. I didn't know you could use the MFS functions with a /ipfs/ prefix? Or is that only applicable to ipfs.files.stat?

@achingbrain
Copy link
Member

achingbrain commented Sep 1, 2020

The docs could definitely be improved here (and also js-IPFS supports IPFS paths in a few more places than go-IPFS) but the IPFS namespace is overlaid on top of the MFS commands so you can use /ipfs/Qmfoo style strings as paths where you're not trying to mutate an IPFS path.

Eg:

ipfs.files.cp('/ipfs/Qmfoo', '/path/in/local/mfs') - copy a file from the wider IPFS namespace into your MFS
ipfs.files.cp('/ipfs/Qmfoo', '/ipfs/Qmbar') // ipfs is immutable, this makes no sense
ipfs.files.rm('/ipfs/Qmfoo') // ipfs is immutable, this makes no sense
ipfs.files.stat('/ipfs/Qmfoo') - get stats on an IPFS path
ipfs.files.ls('/ipfs/Qmfoo') - list the contents of an IPFS path
etc

@Gozala
Copy link
Contributor

Gozala commented Sep 2, 2020

We’re planning to use a shared IPFS worker on our webnative apps, so your data is available instantly across all those apps.

Hi @bmann thanks for reaching out! I'm happy that fission team is looking into adopting shared worker and would love to help make that effort succeed. I am also happy to jump to a call so we can sync up on higher bandwidth channel.

We’ve added a HTML file and worker JS file to our auth lobby, so you can load that in using an iframe.
Staging version of that HTML file:
https://round-aquamarine-dinosaur.fission.app/ipfs.html

I am afraid I don't have enough context about auth lobby or what that iframe is meant for, but I am a guessing that different apps run on different origins and iframe is meant to provide shared origin for the shared worker. If my guess is accurate that is pretty much the plan for shared ipfs node across origins. There are some details about it here https://hackmd.io/@gozala/H1LqrvvcI#Cross-Origin

Ideally you won’t need to think about iframes at all, and the webnative sdk will handle that for you.

Absolutely! That is a plan, but there are quite a few things that need to be hammered out (which linked document hints on), like:

  • upholding origin isolation
  • storage quota management
  • finding a good way to share iframe without introducing centralization.

For this to work we would need a few extra methods on ipfs-message-port-client:

  • ls, listing directory contents.
  • object.stat, getting the size of directories and files.
  • swarm.connect, connecting to peers.

Are you planning on adding these methods?

We have arrived to the current API subset based on the feedback from teams that were interested in this effort. It would really help to get a better understand what are concrete APIs used for so we can meet those needs instead of aiming for full API compatibility with JS IPFS, mostly because doing so in an efficient way is going to be challenging and so far we have been finding that what devs need can be achieved in much simpler and efficient ways.

@bmann
Copy link
Author

bmann commented Sep 2, 2020

@Gozala we’re in Discord chat at https://fission.codes/discord or send me an email at boris @ fission.codes — would love to get on a call.

@jacobheun jacobheun added kind/discussion Topical discussion; usually not changes to codebase and removed need/triage Needs initial labeling and prioritization labels Sep 10, 2020
@icidasset
Copy link
Contributor

icidasset commented Sep 16, 2020

Hey @Gozala

I am a guessing that different apps run on different origins and iframe is meant to provide shared origin for the shared worker.

You guessed correctly ☺️
Let me know if I need to expand on this.

It would really help to get a better understand what are concrete APIs used for so we can meet those needs instead of aiming for full API compatibility with JS IPFS.

Makes total sense.
(1) We can probably remove the need for object.stat, by replacing it with files.stat as suggested previously.

(2) We want ipfs.ls, because:

  • We need to tell if a link is a directory or not
  • Modification time of files & directories
  • Size of files & directories

We could keep track of all that metadata ourselves, but I guess that would defeat the purpose of ipfs.ls? Correct me if I'm wrong.

(3) And lastly, we want swarm.connect, to well, connect to peers. We're currently using this to ensure our connection to bootstrapped nodes. We sometimes lose the connection to those nodes (page could be open for days), so we ensure that connection using swarm.connect.

@Gozala
Copy link
Contributor

Gozala commented Sep 22, 2020

Let me know if I need to expand on this.

Thanks for clarifications, they do help. I do however still would like to understand more generally the use case. The reason is that all the teams I have talked to mostly used Dag API with IPFS network for storing and retrieving data, and tiny subset of file API mostly to refer to it from the data points. There is also yet unmet need for pubsub API. Given that insight we started thinking about https://github.com/ipfs/js-dag-service

If fission's use cases and requirements are different it would really help to understand in which way, which can be little tricky to gather from pure API requirements.

Makes total sense.
(1) We can probably remove the need for object.stat, by replacing it with files.stat as suggested previously.

(2) We want ipfs.ls, because:

  • We need to tell if a link is a directory or not
  • Modification time of files & directories
  • Size of files & directories

All that information can be obtained from ipfs.files.stat, if you do want to list directory entries that obviously would require ls.

It would be also good to understand if MFS fits into fission in some way or if it is irrelevant from fission's standpoint.

We could keep track of all that metadata ourselves, but I guess that would defeat the purpose of ipfs.ls? Correct me if I'm wrong.

You definitely should not have to do that. If you do still need ls please let me know, and I'll write a patch to add it.

(3) And lastly, we want swarm.connect, to well, connect to peers. We're currently using this to ensure our connection to bootstrapped nodes. We sometimes lose the connection to those nodes (page could be open for days), so we ensure that connection using swarm.connect.

Having to dial just to reestablish lost connections seems like a workaround to the problem that should not be there in first place. Meaning that connection manager should just do it (and I know where is major overhaul is in progress there).

It is also kind of an API I'm bit hesitant towards, because with node sharing across origins (or native support in browsers) it's going to require some extra caution. That said I would really like to support your effort in adopting shared worker. I think it would be best to move the piece of logic that ensures connections to the bootstrap nodes to the worker side, where you have full IPFS node available. If for some reason app logic needs to guide this process (meaning you need input from the app state to decide what to connect to) I would really like to understand that better.

@icidasset
Copy link
Contributor

I do however still would like to understand more generally the use case.

Definitely! The use case is a decentralised file system, with a public part and private/encrypted part. That file system is shared between applications which can store data on there. So like other file systems, you have files and directories, we need to know the creation/modification timestamps and the sizes of the files and directories. Apps have access to specific parts of the file system, and permission is given by the user (on that auth lobby we talked about earlier). If you'd like to know more, we have a whitepaper that explains it in full detail: https://whitepaper.fission.codes/file-system/file-system-basics

Here's our app called Drive which you can use to browse through your entire file system.

All that information can be obtained from ipfs.files.stat

Oh cool, didn't know that! Also saw in the code that the mtime is provided, which is not mentioned in the documentation.

if you do want to list directory entries that obviously would require ls

Yeah, we need to list directory entries.
Although, with our file system implementation, we might be able to do it without, but I'm not entirely sure.
That said, we would need to call ipfs.files.stat for each directory entry, I'd think ls is probably more performant?

So yeah, if you could add ls that'd be awesome.

Having to dial just to reestablish lost connections seems like a workaround to the problem that should not be there in first place.

For sure 💯 Good to know it's being worked on. In that case we won't need swarm.connect.

I think it would be best to move the piece of logic that ensures connections to the bootstrap nodes to the worker side, where you have full IPFS node available

Makes sense 👍

@Gozala
Copy link
Contributor

Gozala commented Oct 9, 2020

Just submitted pull request for ipfs.ls #3252

@Gozala
Copy link
Contributor

Gozala commented Oct 9, 2020

@vasco-santos it would be great to get your input in regards to my comment about swarm.connect (posting inline below for convenience)

(3) And lastly, we want swarm.connect, to well, connect to peers. We're currently using this to ensure our connection to bootstrapped nodes. We sometimes lose the connection to those nodes (page could be open for days), so we ensure that connection using swarm.connect.

Having to dial just to reestablish lost connections seems like a workaround to the problem that should not be there in first place. Meaning that connection manager should just do it (and I know where is major overhaul is in progress there).

@vasco-santos
Copy link
Member

vasco-santos commented Oct 10, 2020

Thanks for the heads up @Gozala

Libp2p does not keep connections alive at the moment. This will be worked on the Connection Manager Overhaul as @Gozala mentioned. You can track the progress in the milestones table, there is one point for it.

I expect this to land in libp2p@0.30 or libp2p@0.31. Meanwhile, my suggestion is to keep the connection alive in the application layer. You can use libp2p.ping once in a while to avoid connection timeouts.

@icidasset
Copy link
Contributor

icidasset commented Nov 3, 2020

@Gozala Thanks for adding ls support! 👏

Is there anything else that changed?
Because it's not working for me anymore.

Screenshot 2020-11-03 at 16 55 13

Can't seem to do dag.get for some reason, does it not support v1 CIDs?
Both passing a CID class instance or a string doesn't work.
Error points to https://github.com/ipfs/js-ipfs/blob/master/packages/ipfs-message-port-server/src/dag.js#L70

Details:

  • If I use a CID class instance, the error becomes:
    Error: Failed to execute 'postMessage' on 'MessagePort': ArrayBuffer at index 1 is a duplicate of an earlier ArrayBuffer.
    
  • This is using js-ipfs v0.51.0 on Chrome MacOS Big Sur Beta
  • Using the prebuilt cdnjs.com version (can't npm install because bcrypt install errors)
  • Had to use several hacks in the shared worker to get it to load (window = self and self.RTCPeerConnection = true)
  • There's an iframe in between (as before, with v0.2.2 this did work)

@Gozala
Copy link
Contributor

Gozala commented Nov 10, 2020

@icidasset I don't believe there has been any intentional changes, although changes in ipfs-core could cause some regressions which is maybe what happened 🤨 If so, it is surprising that no tests have caught this, so it would be good create minimal reproducible test case so we can add it to the suit.

s there anything else that changed?
Because it's not working for me anymore.

Screenshot 2020-11-03 at 16 55 13

Can't seem to do dag.get for some reason, does it not support v1 CIDs?
Both passing a CID class instance or a string doesn't work.
Error points to https://github.com/ipfs/js-ipfs/blob/master/packages/ipfs-message-port-server/src/dag.js#L70

Hmm so error in the screenshot seems to come from:

async get (query) {
const { cid, path, localResolve, timeout, signal } = query
const { value, remainderPath } = await this.ipfs.dag.get(
decodeCID(cid),
{

Object.setPrototypeOf(cid.multihash, Uint8Array.prototype)
Object.setPrototypeOf(cid, CID.prototype)

Would be good to know which of those two lines are throwing so we know if it is cid that doesn't make it across message port or it's multihash property.

Both passing a CID class instance or a string doesn't work.

We don't support strings in get method:

async get (cid, options = {}) {
const { value, remainderPath } = await this.remote.get({
...options,
cid: encodeCID(cid, options.transfer)
})

Details:

  • If I use a CID class instance, the error becomes:
    Error: Failed to execute 'postMessage' on 'MessagePort': ArrayBuffer at index 1 is a duplicate of an earlier ArrayBuffer.
    

Are you by a chance providing options.transfer property ? Because if you do provide it ArrayBuffers for CID will be transferred across threads:

const encodeCID = (cid, transfer) => {
if (transfer) {
transfer.push(cid.multihash.buffer)
}
return cid
}

And render your cid instance with slate buffer, which is maybe what that error above is trying to say.

Now that I am looking at this, I am realizing that we should not be passing options.transfer to encodeCID on the client side anyway. It is something we do one the server because all the array buffers will be GC-ed anyway after response is send, so we could transfer them instead of copying.

  • Using the prebuilt cdnjs.com version (can't npm install because bcrypt install errors)

Do we have a bug report on file for the bcrypt install error ? If no can you please create one, having to resorting to workarounds is not the dev experience we'd like to have.

  • Had to use several hacks in the shared worker to get it to load (window = self and self.RTCPeerConnection = true)

Could you please create reports for those too (and cc me) we should not have to resort to workarounds. For what it's worth we have one example that demonstrates message-port-server/client with shared worker and does not require such workarounds

https://github.com/ipfs/js-ipfs/tree/master/examples/browser-sharing-node-across-tabs

There is also another example that demonstrates use with service worker under review and that one also did not required workarounds
#3374

I'm also curious what RTCPeerConnection = true attempts to do there, because there are no webrtc APIs in workers.

@icidasset
Copy link
Contributor

@Gozala It seems to be this line, if I pass a CID as a string to ipfs.dag.get

Object.setPrototypeOf(cid.multihash, Uint8Array.prototype)

If I use a CID class instance, I'm not passing in the transfer option.
I did try that however, same error as posted previously (ie. ArrayBuffer at index 1 is a duplicate ...)

Do we have a bug report on file for the bcrypt install error ?

No there isn't. I looked deeper into this and it isn't an issue with js-ipfs, but rather the new MacOS which shifted a few things around causing the bcrypt install to fail. Could also be my Nix setup though 🤔 I'll try to figure it out and make an issue if needed.

Had to use several hacks in the shared worker to get it to load

Good news, it looks like this has been fixed in the newer versions 👍

@icidasset
Copy link
Contributor

Oh I should also note, regarding this error

Error: Failed to execute 'postMessage' on 'MessagePort': ArrayBuffer at index 1 is a duplicate of an earlier ArrayBuffer.

(without a transfer option on my end)

That error seems to originate from:

port.postMessage(
{ type: 'result', id, result: { ok: true, value } },
value.transfer || []
)

@Gozala
Copy link
Contributor

Gozala commented Nov 16, 2020

@Gozala It seems to be this line, if I pass a CID as a string to ipfs.dag.get

String CIDs aren't supported (See https://github.com/ipfs/js-ipfs/blob/master/packages/ipfs-message-port-client/src/dag.js#L64). My hope is we can migrate loosely defined types into more concrete in ipfs-core as well.

If I use a CID class instance, I'm not passing in the transfer option.
I did try that however, same error as posted previously (ie. ArrayBuffer at index 1 is a duplicate ...)

I have no way to explain why this would happen however unless somehow you end up with CID that doesn't have multihash property. Are you by a chance pulling CID class from here:
https://github.com/multiformats/js-multiformats/blob/master/src/cid.js

That is only thing I can think of that might explain it.

Either way I would check if CID instance passed to dag.get has multihash property and that it's a typed array. If it isn't that is why you see the error & we'd need to figure out how did you end up with such CID instance.

If you are able to add a breaking test to the suite, that would greatly help as we can then reproduce the issue and fix it. That said I suspect there is something else going on in your setup because we have plenty of tests that exercise this.

@Gozala
Copy link
Contributor

Gozala commented Nov 16, 2020

Oh I should also note, regarding this error

Error: Failed to execute 'postMessage' on 'MessagePort': ArrayBuffer at index 1 is a duplicate of an earlier ArrayBuffer.

(without a transfer option on my end)

That error seems to originate from:

port.postMessage(
{ type: 'result', id, result: { ok: true, value } },
value.transfer || []
)

Ok that is interesting, but should be unrelated issue. It suggests that we are attempting to transfer same buffer multiple times & since we transfer everything from worker back to main thread anything returning same instance would exhibit that issue. It would be great to identify when that is the case however so it can be addressed.

@Gozala
Copy link
Contributor

Gozala commented Nov 16, 2020

It just occurred to me that "duplicate of an earlier ArrayBuffer" problem might be causing the other issue. That is if that error occurs when we send some CID from the server, it probably ends up corrupt on the client. And if you then use that corrupt CID to do the dag.get on the client it likely fails on the server because used cid was corrupt and did not had multihash property.

@Gozala
Copy link
Contributor

Gozala commented Nov 16, 2020

@icidasset any chance we can narrow this down, so it could be reproduced ?

@icidasset
Copy link
Contributor

I'll see if I can narrow it down 👍

multihash property is definitely there, and multihash.buffer as well.
I'm using https://www.npmjs.com/package/cids to make a new CID from a v1 CID string.

@icidasset
Copy link
Contributor

Maybe this is the thing that happens here?
https://github.com/nodejs/node/blob/0ff0af534ef150820ac218b6ef3614dc199de823/test/parallel/test-worker-message-port-transfer-duplicate.js#L21-L28

What if I get a EncodedDAGNode back with multiple of the same CID? Is that possible? Should we check if we're adding something to the transfer list that's already in there? Because my error does say index 1, not zero, could be something like this perhaps?

My case might also be a bit different than in the tests, because I'm using an iframe in between, don't know if that matters or not.

@icidasset
Copy link
Contributor

If I wanted to write a test for this, how would I got about that?
Do I need to look at interface-ipfs-core?

@Gozala
Copy link
Contributor

Gozala commented Nov 17, 2020

Maybe this is the thing that happens here?
https://github.com/nodejs/node/blob/0ff0af534ef150820ac218b6ef3614dc199de823/test/parallel/test-worker-message-port-transfer-duplicate.js#L21-L28

I think you're right, just was able to reproduce that

image

What if I get a EncodedDAGNode back with multiple of the same CID? Is that possible? Should we check if we're adding something to the transfer list that's already in there? Because my error does say index 1, not zero, could be something like this perhaps?

It is clear that something that server sends to the client refers to the same ArrayBuffer. It could either:

  1. Something has same CID.
  2. Multiple CIDs or possibly other things refer to the slices in the same ArrayBuffer.

Both would create cause this problem given that they would lead to duplicates.

My case might also be a bit different than in the tests, because I'm using an iframe in between, don't know if that matters or not.

My guess is it isn't iframe, it something about the dag nodes probably multiple links to the same node. I'm still surprised we end up with same CID instance instead of two equal ones, as I can't recall any caching logic that would do it, but that seems most likely problem.

Unfortunately that does not at all explain why dag.get(new CID(...)) causes the error. There must be something else going on that we're missing.

@Gozala
Copy link
Contributor

Gozala commented Nov 17, 2020

If I wanted to write a test for this, how would I got about that?
Do I need to look at interface-ipfs-core?

If you want to write the test that is exercised a across ipfs-core, ipfs-http-client, ipfs-message-port-client it's best to add it to interface tests so somewhere here:

https://github.com/ipfs/js-ipfs/blob/master/packages/interface-ipfs-core/src/dag/get.js

If it is specific to a specific package it can added into the package like ipfs-message-port-client more or less how ipfs-http-client does here

https://github.com/ipfs/js-ipfs/blob/master/packages/ipfs-http-client/test/dag.spec.js

@Gozala
Copy link
Contributor

Gozala commented Nov 17, 2020

@icidasset I have created separate issue about the transfer duplicate ArrayBuffer's that you have identified #3402. Would you be interested & have time to take a stab at it ? Should be fairly straight forward I think, mostly just replacing arrays with sets.

@icidasset
Copy link
Contributor

I don't have that much time, but I'll see what I can do.

@icidasset
Copy link
Contributor

I think this can be closed @Gozala
The shared worker setup is doing pretty decent, very awesome 🎉
Thanks for all the help! ☺️ 👏

The main issues we still have are to do with connectivity to our primary node. We often have to do a ipfs.swarm.disconnect() and then ipfs.swarm.connect() for it to re-establish the connection to the node. I had to write a script that does a ping every few minutes and if the ping doesn't respond in 60 seconds it does the disconnect + connect. Using neither ping or connect on their own doesn't work. Hopefully this'll improve when libp2p gets the keepalive connections?

The storage quota is something I still need to look at. Did you have any plans for that?

@Gozala
Copy link
Contributor

Gozala commented Dec 29, 2020

I think this can be closed @Gozala
The shared worker setup is doing pretty decent, very awesome 🎉
Thanks for all the help! ☺️ 👏

I'm glad to heard that! And thanks for contributing fixes!

The main issues we still have are to do with connectivity to our primary node. We often have to do a ipfs.swarm.disconnect() and then ipfs.swarm.connect() for it to re-establish the connection to the node. I had to write a script that does a ping every few minutes and if the ping doesn't respond in 60 seconds it does the disconnect + connect. Using neither ping or connect on their own doesn't work. Hopefully this'll improve when libp2p gets the keepalive connections?

Needing this workaround is unfortunate, but lets have separate issue / thread to figure out the fix there whether it's keepalive or something else.

In the meantime I would suggest just moving that logic into worker thread, assuming there is no reason it must be in the main thread. If for some reason it must remain in main thread, lets discuss them so we can figure out a better way to go about it.

The storage quota is something I still need to look at. Did you have any plans for that?

I have not really done much beyond just thinking about this. Which goes as:

  • We can't really do much against global limit and it's probably best to let browser make a call there. (Long term I would love some notion of delegate nodes so that browser nodes could be pared with in order to sync across devices & back up. I'm working on theme proposal to that end for the roadmap).
  • Overcoming group limit (one per TLD) might be worth exploring. And there are various options to consider with different tradeoffs:
    1. Use storage manager API and perform GC. Additionally we could mimic browsers use of LRU policy and evict content when limits are just impossible to meat. Maybe we could even introduce some form of archiving to prevent data loss (e.g. prompt user to download an evicted content bundle in car format or migrate it to secondary TLD storage)
    2. Instead of using single tld providing shared access (via hidden iframe that spawns shared worker) we could use multiple tlds, so that data could overflow into next one. That would introduce quite a bit complexity that I'm not sure could be justified. It also does not align with browser's tld based limits
    3. Put origin based store in front of shared one. In this scenario client could reach for local db falling back to shared worker (that uses shared db) falling back to network. Assuming shared worker track which content belongs to which origin, it will be able to safely evict origin based on LRU policy when at capacity, while making that content still accessible by a source origin (from it's local db).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/discussion Topical discussion; usually not changes to codebase status/in-progress In progress
Projects
None yet
Development

No branches or pull requests

6 participants