Skip to content
This repository has been archived by the owner on Feb 12, 2024. It is now read-only.

Arraybuffer at index 0 is already detached #3480

Closed
icidasset opened this issue Jan 14, 2021 · 9 comments
Closed

Arraybuffer at index 0 is already detached #3480

icidasset opened this issue Jan 14, 2021 · 9 comments
Labels
status/ready Ready to be worked

Comments

@icidasset
Copy link
Contributor

icidasset commented Jan 14, 2021

Severity:

Critical

Description:

Screenshot 2021-01-14 at 19 18 24

We've implemented a filesystem on top of IPFS, which is a mix of DAG PB and DAG CBOR nodes. IPFS is loaded using ipfs-message-port-client through an iframe, which then loads ipfs-message-port-server and js-ipfs.

I make a bunch of concurrent changes to the file system, resulting in several dag.put calls. And it's somewhere here I suspect the error originates from.

Steps to reproduce the error:

Hard to reproduce, feels like a concurrency error 🤔

You can try it at https://nightly.diffuse.sh/
→ Login with Fission
→ Add some content (add music, mark some stuff as favourite, make a playlist)
→ Export
→ Import
→ ... wait a minute or so ...

Moving forward

Ok so, I'm working on a PR to fix this, but I don't know how to proceed exactly. How does the ArrayBuffer become detached exactly? Does it become detached if you put in the transfer list of a MessagePort? Should we make a copy of the ArrayBuffer, or should we ignore it? 🤔

@icidasset icidasset added the need/triage Needs initial labeling and prioritization label Jan 14, 2021
icidasset added a commit to icidasset/js-ipfs that referenced this issue Jan 14, 2021
icidasset added a commit to icidasset/js-ipfs that referenced this issue Jan 15, 2021
icidasset added a commit to icidasset/js-ipfs that referenced this issue Jan 15, 2021
icidasset added a commit to icidasset/js-ipfs that referenced this issue Jan 15, 2021
icidasset added a commit to icidasset/js-ipfs that referenced this issue Jan 15, 2021
@Gozala
Copy link
Contributor

Gozala commented Jan 15, 2021

@icidasset thanks for tackling this issue. While dropping empty ArrayBuffers was a clever way to avoid this errors, I feel like it is not a right approach, because it will not fix the problem, but would rather push it further down the stack. What I mean is that corresponding Uint8Array was highly unlikely to be empty, it just end up been empty because it was already transferred. So dropping corresponding ArrayBuffer would lead to getting an empty Uint8Array as opposed to what it meant to hold and lead to some other bugs downstream.

I think we need to get to the bottom of this, meaning understand how do we end up passing same Uint8Array multiple times and fix that.

I understand it can be really hard to figure out the origin of the issue, but I think there is a viable option of doing so by doing following:

  1. Create WeakMap that maps transferable to new Error() (for stack traces).
  2. Have version of your ensureNonDetachedUniqueBuffers function will go over each transferable and check if weak map has it and if so executed debugger statement (or something along those lines). If it's not there sticks the transferable mapped to new Error().

I think that would help detect the source of the the problem. I'm happy to do a peer session or something if that would help. Otherwise some automated example with a source code that I can modify would be a very helpful.

@icidasset
Copy link
Contributor Author

@Gozala Yeah that makes total sense. I'll see if I can implement it like that 👌 Thanks! I guess SharedArrayBuffer is not really applicable here, because then we'd need new instances of Uint8Array? (also it's not really cross-browser yet)

@icidasset
Copy link
Contributor Author

@Gozala Also, I agree, I would like to figure this out as well.

I guess a buffer only becomes detached by transferring it via postMessage. So that means the same instance of a buffer is passed through a postMessage on multiple occasions. Right?

What I found so far:
Every error seems to point to https://github.com/ipld/js-ipld-dag-pb/blob/master/src/util.js#L49 (we are working with DAG PB nodes). Which then calls https://github.com/ipfs/protons/blob/master/src/compile/decode.js#L153 I don't understand how at that point the buffer can already be detached 🤔

@Gozala
Copy link
Contributor

Gozala commented Jan 18, 2021

I guess a buffer only becomes detached by transferring it via postMessage. So that means the same instance of a buffer is passed through a postMessage on multiple occasions. Right?

That is my understanding as well.

What I found so far:
Every error seems to point to https://github.com/ipld/js-ipld-dag-pb/blob/master/src/util.js#L49 (we are working with DAG PB nodes). Which then calls https://github.com/ipfs/protons/blob/master/src/compile/decode.js#L153 I don't understand how at that point the buffer can already be detached

Question is where is that buffer coming from. Because whatever allocates it seems to retain & reuse it. I imagine there could be few reasons for this to happen:

  • Buffer.allocUnsafe I believe recycles buffers and if mode polyfill does that it may potentially explain this.
  • Another reason I can imagine is that codec might allocate large buffer and decode multiple blocks with multiple typed array views pointing to different byteOffset/byteLength but into same buffer.
  • There is some code that tries to be smart by caching results and avoiding computing same thing more than once (I think cid’s do it ).

I think figuring out which one is it will allow us to fix it. Maybe storing not just stack traces but also byte offsets and copies will help eliminate or validate 2nd possibility

@Gozala
Copy link
Contributor

Gozala commented Jan 18, 2021

  • Another reason I can imagine is that codec might allocate large buffer and decode multiple blocks with multiple typed array views pointing to different byteOffset/byteLength but into same buffer.

This one would actually explain the former issue that you’ve fixed by using sets. However in this case it appears to span across multiple posts, which makes me think it’s more likely to be a 3rd or 1st case.

@Gozala
Copy link
Contributor

Gozala commented Jan 18, 2021

I also think it something with the usage pattern, because I have not encountered this problem while serving full sites in the service worker example.

@icidasset
Copy link
Contributor Author

The issue always occurs with dag.geting a DAG PB node. So I don't think it's something to do with the buffers from CIDs 🤔 Additionally the code that produces an error seems to use block.data (which is separate from block.cid).

dag.get calls don't do multiple postMessage calls do they? (ie. they're not iterables)

More notes:

  • Only gives an error when that query has been run before (which makes sense)
  • Can sometimes do the query again later and run just fine

This last note makes me think that problem is definitely not in js-datastore-level. Otherwise I'd imagine we'd always get the detached error. But on the other hand, I traced back the error to here: https://github.com/ipld/js-ipld/blob/10c3777a70a0c8b5ba424174707682e3df9733ec/src/index.js#L138 And that seems to call the block service, which is also points to a js-datastore-level thing.

I'm so confused 😅 Anyhow, digging further, learning a lot about js-ipfs internals here 😂

I also think it something with the usage pattern, because I have not encountered this problem while serving full sites in the service worker example.

Yeah for sure, our filesystem implementation is quite a complex use-case I imagine.

@icidasset
Copy link
Contributor Author

@Gozala I think I figured it out.
The errors are for blocks that need to be fetched from the network (I checked the bitswap wanted list).
That happens in the bitswap code here:
https://github.com/ipfs/js-ipfs-bitswap/blob/b0d2174af2ae51be9402858e953a23baba132144/src/index.js#L186

I think there might be a "concurrency" issue here if I call dag.get with the same CID very fast after each other, this.notifications.wantBlock(cid, options) could actually return the same block instance. What do you think? Is that possible? 🤔

@icidasset
Copy link
Contributor Author

icidasset commented Jan 21, 2021

I got around the problem by pinning the used DAG structures. Our library was supposed to do this anyway, but we had a try catch around ipfs.pin.add (because it gives an error for non-recursive pinning if it is already recursively pinned), hiding the fact this interface wasn't in ipfs-message-port-client. Will make a PR for adding pin.add to the ipfs port libraries.

Besides that, I think the above ☝️ is the cause of the problem. Couldn't find anything else that seemed related.

@Gozala Gozala added status/ready Ready to be worked and removed need/triage Needs initial labeling and prioritization labels Jan 21, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
status/ready Ready to be worked
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants