Skip to content
This repository has been archived by the owner on Feb 12, 2024. It is now read-only.

fix: report ipfs.add progress over http #3310

Merged
merged 24 commits into from
Nov 16, 2020
Merged

fix: report ipfs.add progress over http #3310

merged 24 commits into from
Nov 16, 2020

Conversation

Gozala
Copy link
Contributor

@Gozala Gozala commented Oct 1, 2020

This change integrates changes from ipfs/js-ipfs-utils#60. Once ipfs-utlis with changes are released I'll update dependencies so they don't point to the branch.

I think electron-renderrer tests are fail due to https://github.com/ipfs/go-ipfs-cmds/pull/201/files

@achingbrain this changes what numbers are passed to optional progress handler when calling add or addAll, specifically on http client numbers would reflect request payload size and not the file(s) size. Alternative could be to report percentage instead. Let me know what do you thing makes more sense here. In terms of using this in webui reporting percentage makes more sense, because otherwise we end up calculating total size + estimated metadata and the derive percentage from that.

@Gozala
Copy link
Contributor Author

Gozala commented Oct 5, 2020

I have bumped go-ipfs dependency to 0.7 to fix test failures caused by ipfs/go-ipfs-cmds#201, but it appears that 0.7 causes some other tests to fail, so I might have to create a separate PR that makes a update to 0.7 and fixes all those tests.

@achingbrain
Copy link
Member

achingbrain commented Oct 6, 2020

Master has go-ipfs@0.7.0 now so if you merge that into this branch you should be able to rule that out as a source of failure.

The core emits progress based on chunks of the file consumed, not including metadata as we want to avoid the situation where you upload 100b and get notifications up to 110b (or whatever).

From what I understand, using the onProgress event of an XmlHttpRequestUpload would be the same - that is the payload is the file and the progress reported does not include bytes that are not the file (e.g. headers, etc) - or have I got that wrong?

Alternative could be to report percentage instead

This is harder as you'd have to know the size of the thing(s) you are uploading in advance which you might not depending on input. Better to let the user calculate percentages if it's important to them.

@Gozala
Copy link
Contributor Author

Gozala commented Oct 7, 2020

From what I understand, using the onProgress event of an XmlHttpRequestUpload would be the same - that is the payload is the file and the progress reported does not include bytes that are not the file (e.g. headers, etc) - or have I got that wrong?

🤔 So onUploadProgress in node is called with number of bytes that were emitted from request body, in browser it does just use onProgress event. When I changed addAll of http-client to use onUploadProgress I did see test failure because progress exceeded the file size. Which makes sense in node because of the added form-data stuff.

I am guessing that browsers don't count headers but they do count the form-data stuff, but I could be wrong. If that is the case, I could leverage onProgress in browser and continue reporting progress in node as we used to. If browsers do count in form-data stuff, I can only think of one other way of doing it by emitting combinedFilesSize * event.loaded / event.total (Unfortunately event.total is not guaranteed to be present).

@Gozala
Copy link
Contributor Author

Gozala commented Oct 7, 2020

Just verified that XHR onProgress does in fact counts in the from-data extras, however good news is it seems to contain total as well so we could interpolate progress in terms of just files and only in browsers by calculating total size as we build up form-data. Unfortunately that would add extra complexity and will further diverge code paths.

Another alternative could be to just have separate progress reporting mechanism e.g. onUploadProgress, but I personally not in favor of this because currently:

Passing progress handler into add / addAll via http-client (in browser) causes long tail of progress events to be queued and then discharged all together once upload is complete. This makes progress handler both broken and also puts client at standstill (see ipfs/ipfs-webui#1655)

@achingbrain could you please let me know which way do you want to go here, before I go about making them ?

So I think this leaves us with two optinos

@Gozala
Copy link
Contributor Author

Gozala commented Oct 7, 2020

Correction: I misunderstood what progress handler was passed. I have changed the implementation accordingly. Feel free to disregard original comment below.


I have decided to implement progress reporting (in browsers) by interpolating it from upload events. In the process I realized that progress function was passed upload chunk sizes and not a compound size as I assumed before. Test on the other hand seemed to assume compound size instead of chunk sizes. @achingbrain could you please clarify what is the expected value passed to progress is that a chunk sizes or sum of added chunk sizes ?

@Gozala
Copy link
Contributor Author

Gozala commented Oct 8, 2020

Last remaining failures seem to be caused by the ipfs-http-server doing strict origin checks that ipfs/go-ipfs-cmds#201 loosened up.

@achingbrain
Copy link
Member

js-ipfs has the same fix as ipfs/go-ipfs-cmds#201 in master now so this should be unblocked. Lots of conflicts though.

@achingbrain
Copy link
Member

achingbrain commented Oct 12, 2020

@achingbrain could you please clarify what is the expected value passed to progress is that a chunk sizes or sum of added chunk sizes ?

To maintain compatibility with go-ipfs it's individual chunks of files, not the sum of added chunks (it's the total bytes processed, see next comment).

Also, it doesn't tell you which file the chunk is from, so it's not terribly useful as it stands (it passes the file path to the handler).

It would be better if there was more metadata returned with the progress event - the path of the file within the DAG being created at a minimum.

You should be able to work it out though, and we could shim this into the http client without having to change go-ipfs if you don't fancy opening a PR, in that over http all files are added sequentially so you could keep track of where you are in the file stream to return the path/filename along with the chunk.

To make js core do the same thing we can just update the unixfs importer to pass the file path back with the chunk (this has been done).

@achingbrain
Copy link
Member

achingbrain commented Nov 11, 2020

A couple of clarifications on the last comment:

  1. The size passed to the progress handler is the accumulated total for a file being imported
  2. The progress handler now (as of ipfs@0.52.0 and ipfs-http-client@48.1.0) receives the supplied path as well as the total bytes imported so far for a given file during import

See the (new) tests: https://github.com/ipfs/js-ipfs/blob/master/packages/interface-ipfs-core/src/add.js#L117-L145

@achingbrain achingbrain changed the title feat: integrate htttp-client progress reporting feat: integrate http-client progress reporting Nov 11, 2020
@achingbrain achingbrain changed the title feat: integrate http-client progress reporting fix: integrate http-client progress reporting Nov 11, 2020
@achingbrain
Copy link
Member

Could you rebase this please?

Also, I'm a little wary here as:

a) It makes the behaviour of the http client diverge from core
b) It doesn't solve the problem as it stands since if you upload two files, the total upload progress will keep ticking up until it's the combined size of both files plus the size of all the multipart headers

@achingbrain
Copy link
Member

Thinking about this a bit more, I think it's possible to retain compatibility with core, if we accept that the progress will only be mostly accurate.

In the browser we have to consume the input stream for files in its entirety before starting the request in order to create the FormData object, while doing that we can keep a tally of file names and sizes.

When ipfs/js-ipfs-utils#60 lands, it'll give us the total bytes transferred for the whole request. Because the files are uploaded serially the progress events should be received serially as well. Armed with that knowledge we can translate the total bytes transferred into progress through individual files and pass the same arguments to the progress callback function option as core does.

Of course the total bytes transferred will include multipart headers etc so we either add a little overhead to each file size and account for that in the progress callback arguments, or just let the final progress callback invocation occur a short amount of time before it's actually finished.

That should tide us over until #3371 lands and gives us proper streaming everywhere.

@Gozala
Copy link
Contributor Author

Gozala commented Nov 13, 2020

@achingbrain I have updated the pull to derive progress for individual part based on total progress.

@Gozala
Copy link
Contributor Author

Gozala commented Nov 13, 2020

Turns out there is one more problem, namely it appears that no progress events are emitted for directories as per tests here

const dirs = [
content('pp.txt'),
content('holmes.txt'),
content('jungle.txt'),
content('alice.txt'),
emptyDir('empty-folder'),
content('files/hello.txt'),
content('files/ipfs.txt'),
emptyDir('files/empty')
]
const total = dirs.reduce((acc, curr) => {
if (curr.content) {
acc[curr.path] = curr.content.length
}
return acc
}, {})
const handler = (bytes, path) => {
progressSizes[path] = bytes
}
const root = await last(ipfs.addAll(dirs, { progress: handler }))
expect(progressSizes).to.deep.equal(total)

While my implementation in this pull request was ensuring that progress events were fired for dirs. I will change implementation to omit dirs, however I wanted to call this out because this is yet another WTF in the way we report progress.

@achingbrain achingbrain changed the title fix: integrate http-client progress reporting fix: report ipfs.add progress over http Nov 16, 2020
@achingbrain achingbrain merged commit 39cad4b into master Nov 16, 2020
@achingbrain achingbrain deleted the progress-events branch November 16, 2020 09:54
@achingbrain
Copy link
Member

Shipped in ipfs-http-client@48.1.1

SgtPooki referenced this pull request in ipfs/js-kubo-rpc-client Aug 18, 2022
The browser fetch api doesn't allow reading of any data until the whole request has been sent which means progress events only fire after the upload is complete which rather defeats the purpose of reporting upload progress.

Here we switch to XHR for uploads with progress that does allow reading response data before the request is complete.

Co-authored-by: achingbrain <alex@achingbrain.net>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants