Skip to content
This repository has been archived by the owner on Feb 12, 2024. It is now read-only.

[WIP] feat: support chunked add requests #1540

Closed
wants to merge 10 commits into from
Closed

Conversation

hugomrdias
Copy link
Member

@hugomrdias hugomrdias commented Sep 3, 2018

This PR supports this ipfs-inactive/js-ipfs-http-client#851 in the ipfs-api

todo:

  • use file per chunk, resume should simpler with this
  • try not using files and add directly
  • maxBytes value ???

next(in a new PR):

  • resumable add
  • concurrent chunk request
  • delete zombie tmp files

These need more work so we don't add corrupted data.

  • check duplicates
  • check missing chunks
  • check chunk size matches content-range
  • in the last request right before adding check that full size matches the number of chunk in the folder

Copy link
Member

@lidel lidel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my thoughts at ipfs-inactive/js-ipfs-http-client#851 (review) first, then the inline comment below.


const streams = []
const filesDir = tempy.directory()
Copy link
Member

@lidel lidel Sep 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just so that we don't forget: we should remove temporary files after successful upload, but also have a plan for what happens with temp files produced by failed/aborted ones. I don't think we can assume temp dir will be cleared by the OS.
Leaving them forever may be a self-administered DoS attack (eventually running out of disk space)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes definitely should go like this

  • success adding ? delete file
  • errors during session ? define a way to remove only the last chunk from the file ( or have one file per chunk ) this would allow for resumable add's
  • on new session start async check the temp folder and delete all files older than 1 hour
  • also we need to handle open streams, too many can become a problem

@alanshaw alanshaw changed the title WIP feat: support chunked add requests [WIP] feat: support chunked add requests Sep 4, 2018
if (err) {
pushStream.push({
Message: err.toString(),
Code: 0,
Copy link
Member

@lidel lidel Sep 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI I've tracked down the source of values used in Code field in go-lang implementation (at go-ipfs-cmdkit/error.go#L9-L25).

Translating from iota notation, it is an enum:

	ErrNormal         //  ErrorType = 0
	ErrClient         //  ErrorType = 1
	ErrImplementation //  ErrorType = 2
	ErrNotFound       //  ErrorType = 3
	ErrFatal          //  ErrorType = 3

Not sure if that is a problem, probably hardcoding 0 here is fine.

payload: {
parse: false,
output: 'stream',
maxBytes: 1000 * 1024 * 1024
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • How do we pick what is the default max chunk size? Eg. why 1GB and not 256MB ?
  • Is this going to be a hardcoded limit, or a configuration option that can be passed to js-ipfs constructor?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we pick what is the default max chunk size? Eg. why 1GB and not 256MB ?

we need this value to be high for the non chunked path

Is this going to be a hardcoded limit, or a configuration option that can be passed to js-ipfs constructor?

don't know. what you think ?

from the hapi documentation when output=stream, maxBytes doesn't matter but this doesnt seem to be the case. Needs further investigation, maybe it's a matter of upgrading hapi dunno

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't think of anything right now.
Unless you see good use case for customizing it, lets go with hardcoded value for now.
Exposing it as a config property can be contributed later in separate PR.

@alanshaw
Copy link
Member

@hugomrdias is this ready for review or does this need feedback for you to proceed?

If yes then please in future assign me as a reviewer and remove the "[WIP]" label from the title when it's ready so I don't skim down the PR list and miss it! Thank you ❤️

@hugomrdias
Copy link
Member Author

Yes I'm gonna do that, just getting this ready for you and others. I have a couple more commits to push with tests and a big fix for a bug I found today.

@hugomrdias hugomrdias changed the title [WIP] feat: support chunked add requests feat: support chunked add requests Sep 17, 2018
Copy link
Member

@alanshaw alanshaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great start 😄 🚀 thanks for working on this - I'm excited to be able to upload big data to IPFS from the browser!

Just a note while I'm thinking about it - the API will need documentation in the README for this and for bonus points an example with a big file!?

There's some review comments for specific tests but this PR needs a suite of tests to ensure the chunking works correctly. Note it doesn't have to include super big files!

Some test cases I'm interested in:

  • Do the temp files get cleaned up on success/error or incomplete request (final chunk not sent)?
  • Do the chunking headers get validated properly?
  • Can I send a single chunk upload?
  • If I make the chunk size really small does it still work?
  • Does progress reporting still work?

const fs = require('fs')
fs.open(file, 'r', (err, fd) => {
if (err) {
cb(err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing return


fs.fstat(fd, (err, stats) => {
if (err) {
cb(err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing return

}

fs.read(fd, buffer, 0, buffer.length, stats.size - 58, function (e, l, b) {
cb(null, b.toString().includes(boundary))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please handle error and use more descriptive variable names!

fs.read(fd, buffer, 0, buffer.length, stats.size - 58, function (e, l, b) {
cb(null, b.toString().includes(boundary))
})
fs.close(fd)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move close to read callback?

}

const matchMultipartEnd = (file, boundary, cb) => {
const buffer = Buffer.alloc(56)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the magic numbers here 56 & 58? Would you mind adding comments to explain or refactor to be more obvious?

reply,
(err) => {
if (err) {
return reply(err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reply will be called twice if there's an error mid stream. Does Hapi somehow deal with this? In createMultipartReply there's error listeners that call callback but they don't do anything to end the replyStream. Does this leave the connection open?

I think we need a test to ensure the correct thing is being done.

parser.on('end', () => {
addStream.end()
})
parser.on('error', cb)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be an abstraction around this that allows us to just pipe it to our add stream.

Name: file.path,
Hash: file.hash,
Size: file.size
}))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should just be a transform, we should just be able to pipe readStream -> parser -> addStream -> replyStream

wrapWithDirectory: query['wrap-with-directory'],
pin: query.pin,
chunker: query.chunker
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should extract this logic into a function e.g. queryToAddOptions in http/api/resouces/files.js and use it in add and addExperimental and just pass the result of calling it to this function. It'll save us having to keep track of another place where these options are constructed.

return [match[1], Number(match[2])]
}

const createMultipartReply = (readStream, request, reply, cb) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO this should just construct and return a stream that can be passed to reply in the handler.

@alanshaw alanshaw changed the title feat: support chunked add requests [WIP] feat: support chunked add requests Nov 12, 2018
@alanshaw
Copy link
Member

@hugomrdias any plans to continue this PR or shall we close?

@hugomrdias
Copy link
Member Author

let it be open i would like to come back to this

@jacobheun
Copy link
Contributor

Closing as this is quite stale.

@jacobheun jacobheun closed this Feb 26, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants