Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

overreading #5

Open
kumavis opened this issue Nov 16, 2016 · 9 comments
Open

overreading #5

kumavis opened this issue Nov 16, 2016 · 9 comments

Comments

@kumavis
Copy link

kumavis commented Nov 16, 2016

what is the expected behavior where reader.read(size, cb) is called where size > the remaining readable data?

@kumavis
Copy link
Author

kumavis commented Nov 16, 2016

I would expect that it would give me the remainder and then drain/abort

@kumavis
Copy link
Author

kumavis commented Nov 16, 2016

current behavior:

pull = require('pull-stream')
Reader = require('pull-reader')

pull(
  pull.values([new Buffer('hi')]),
  reader = Reader()
)

reader.read(4, function(err, result){
  console.log(err, result) // true, undefined
})

@dominictarr
Copy link
Owner

looking at the code what I think it will do is just fall through to ending.
I think that might be the right thing, because if you asked for length it's because you needed that much right? having less than that is a protocol error. If you can take any length, use reader.read(null, cb)

@kumavis
Copy link
Author

kumavis commented Nov 16, 2016

having less than that is a protocol error.

Yes the main case I was that I wanted to see that there was not as much data as expected, and report an error upstream with a message representing that. What currently happens is the stream terminates without any message.

So I think there are two options:

  • read should end with an error
  • read should return the remainder

I think the latter is more generally useful, and can allow the consumer to check if the remaining value was incomplete. This allows someone to use reader.read to "read up to size" and not lose information if there was a partial remainder.

@daviddias
Copy link

The current behaviour seems about right, I tried it by adding another read call:

pull = require('pull-stream')
Reader = require('pull-reader')

pull(
  pull.values([new Buffer('hi')]),
  reader = Reader()
)

reader.read(4, function(err, result){
  console.log(err, result) // true, undefined
})

reader.read(2, function(err, result){
  console.log(err, result) // null, <Buffer 68 69>
})

and it is very similar to standard Node.js streams, where the design is, if you want to read X bytes, it will not return you any byte until it has X bytes buffered. It returned the bytes that it had (<X) then you would have to create a layer of caching yourself, because the stream can't emit the same data twice and doing this manually would cumbersome for wire protocols.

Would knowing the buffer size help solve your case?

@kumavis
Copy link
Author

kumavis commented Nov 16, 2016

@diasdavid
I'm not really sure how the two reads affects things (do they both read from the start? does the first consume the data before the 2nd?). Anyways not sure how a second reader adds to the situation, seems to just complicate the example.

It returned the bytes that it had (<X) then you would have to create a layer of caching yourself

My assumption is that pull-reader does currently cache until it has the size. Maybe I'm misunderstanding this step. The issue is that when the source sends an abort signal, pull-reader does not:

  • flush its cache
  • or signal that there was a cache that did not meet the size requirements.

because the stream can't emit the same data twice

im not sure how this is relevant?

Would knowing the buffer size help solve your case?

my main goal here is better handling of edge cases where things did not behave as expected

@daviddias
Copy link

daviddias commented Nov 17, 2016

My assumption is that pull-reader does currently cache until it has the size

That is correct. That is also why you can't get a 'remainder' from a read operation as it is favourable that data only flows when the next action in the pipeline can do some action upon it.

The issue is that when the source sends an abort signal, pull-reader does not:

If the pull.values were a connection, the error that the connection was dropped should be propagated through the pull pipeline, you are not seeing this happening?

my main goal here is better handling of edge cases where things did not behave as expected

We might have a similar need, handling 'canceled operations' in JS is really hard because of the lack of context (like we have in Go), we've opened this issue https://github.com/ipfs/interface-ipfs-core/issues/58 and are currently exploring what is the best route.

@kumavis
Copy link
Author

kumavis commented Nov 17, 2016

If the pull.values were a connection, the error that the connection was dropped should be propagated through the pull pipeline, you are not seeing this happening?

I see what you are saying about not being able to push data twice - it cant return the remainder and return an error (until the next read is requested).

But imagine the case where there is no error, just the flow ends.

pull(
  pullFile('./data'),
  reader
)

function readNextBytes(cb){
  reader.read(1024, cb)
}

Here we attempt to read chunks of 1kb of data at a time, but we lose the remainder when the file ends. Here are some ideas:

  1. non-standard abort + remainder
// when `end` is `true`, any remaining data is passed on in `data`
// if reading from a 1025 byte file, reader would return:
// (null, 1024 byte)
// (true, 1 byte)
reader.read(1024, function(end, data){
  if (end instanceof Error) return handleError(end)
  handleData(data)
  if (end) allDone()
}
  1. cache abort (if not an error)
// when abort signal is received, next read returns remaining data, following read returns abort
// error still takes priority over remainder
// if reading from a 1025 byte file, reader would return:
// (null, 1024 byte)
// (null, 1 byte)
// (true, null)
reader.read(1024, function(end, data){
  if (end instanceof Error) return handleError(end)
  if (end) allDone()
  if (data.length < 1024) return handleError( new Error('App Specific Error - Corrupted save fail') )
  handleData(data)
}

@dominictarr
Copy link
Owner

@kumavis I think you are correct. read(len, cb) should callback an error as the first arg, not true.
That error can have a message about how many bytes where read.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants