-
Notifications
You must be signed in to change notification settings - Fork 7.3k
Conversation
mikeal
commented
Sep 10, 2011
- Make stream.Stream a read/write stream with proxy methods by default.
- Make readable streams in lib use stream.ReadStream.
- Add stream.createFilter.
… Make readable streams in lib use stream.ReadStream. Add stream.createFilter.
this isn't ready to merge yet. too many tests in master are broken right now for me to be confident that this change doesn't also break something. once there are less failures in master we can look at merging this. for now, we should have a discussion about the API. it also needs a good test for the createFilter API. |
now that i think about it, the read streams in lib are actually bindings to a lower level interface that writes to them, they should probably still inherit from stream.Stream so that we can support buffering. |
Is this in good shape for review now? |
i don't want to attempt to merge it until the tests are passing in master, but we should start picking apart the code that is already in there now. i also want to remove the multi source support so that we can add error propagation. |
FWIW, my stream-stack module implements this same kind of "proxy by default" behavior (so I'm glad to see that it was a good idea after all :p). The point of that module was to write Stream "middle-men", that make up your protocol stack; so this new In any case, I'm just interested to see where this goes. I'll have some technical comments shortly. |
TODO
TODO proposals
|
"always emit 'end', I'm on the fence about proxy the buffering feature. what about putting that in a separate stream, so you pipe through the buffer, and only using it when you want that feature?
I think that this: https://github.com/dominictarr/event-stream/blob/master/index.js#L116-177 |
A few notes about this code:
|
I've been discussing this stuff a bunch with @mikeal and @ry. What we need is a spec that is complete enough to cover all of node-core's use cases (particularly: tcp, tls/ssl, http, zlib/crypto filters, file streams), and simple enough to be understandable and extendable by userland programs. IMO, after windows support, this is the most important bit of low-hanging fruit available to us. One of the biggest problems with this api is that we have far too many ways to say "done", and there's very little consistency. |
One general comment before i reply to people individually. We need to seperate the concerns between Streams in core and Streams that will be created by userland programs. It's unlikely that a userland program will write a new stream around a file handler, it's an edge case at best. I believe that we can simplify the API greatly for node developers if we remove from the public spec the finer points that only really relate to streams we're implementing. |
I implemented a BufferedStream object, it's available here. The problem with the BufferedStream approach is this. function (req, resp) {
var b = new BufferedStream()
req.pipe(b)
talkToRedis(function (e, result) {
b.pipe(request(result.url + req.url)).pipe(resp)
})
} Looks fine, but it's not.The BufferedStream breaks capability detection, all of a sudden request is unaware that it's input in an http request. The only thing you could do would be to use a Proxy or lots of getters to proxy the attribute acces which would be really painful, fairly slow, and confuse the shit out of everyone. You have a good point about createFilter but it's not intended to handle every use case. It's still going to be faster to write a new stream object rather than using createFilter. The idea is to handle 90% of the use cases with a very simple API and defer other use cases to creating a new full Stream object. An alternative API that relies on return values or continuation passing is easily buildable in userland but not as simple as createFilter. @mjijackson I had the same conversation with @felixge about using pause() for buffering. Here's the thing. We need to separate the notion of internal data buffers and messaging the client to stop sending data (which incurs a roundtrip to get it to start sending data again). Currently, pause() means "tell the client to pause". I think we should leave it that way. We need another way to say "hold on to data events and don't tell the client to stop sending yet". If we say that pause() calls buffer() then we're in another bad place where there are two methods that do something similar but not the same (see end(), close(), destroy(), destroySoon()), the confusion caused by this insures that people will mostly call the wrong thing. Also, it's better to not buffer data events when write() returns false (the remote end cannot accept data). The pause() call is going to take RTT to get the client to stop sending data. If we keep pushing that data through the pipe() chain we'll have it in the most efficient place possible, right next to the file descriptor that asked us to stop sending data. In the future we can optimize this to push that data before we emit("drain") and get out of unnecessary roundtrips to the client when working with very slow writers. destrySoon might be a good idea for a few core streams but not for any userland streams so I don't want it in the spec. What we decide to do for files and sockets should be seperated from what we want userland streams to look like. Also, the "callback on write" is a total lie. We don't actually know when data is fsync'd do disc. I find it a better policy to not make API available where it is actually an outright lie :) end(chunk) should NOT emit a "data" event. it should this.write(chunk) before emitting end :) |
I whole heatedly agree with the small core, big userland principle. the one thing that I really need my userland streams to do is be able to handle errors if necessary. I am using BufferedStream! In many of the things that I want to make userland streams for, bleeding edge performance is not quite as important the fact that the stream API is scalable, if you are using the same API for jobs as you do for IO, breaking out part of your application into another process, or another server becomes pretty trivial. I'm confidant that you guys are doing a great job getting core IO streams great, but that is what I am interested in, userland. |
While I agree with you in principle here, and we definitely need to get to a minimal contract that is easy to understand, I absolutely disagree that these "finer points" only apply to streams implemented in core. Anything that goes through any kind of binding layer will have a lot of the same concerns, and will almost certainly have to handle cleanup. So there's your destroy/close/whatever use case. In fact, file streams are one of the simpler types of streams to clean up, since they're strictly uni-directional and have a very clear way to shut them down by calling The difference between a duplex stream and a filter stream is only a gentleman's agreement that ending the input will end the output in some reasonable amount of time. We absolutely need some way to express "I'm ending the input, and forcibly destroying anything and everything about this stream, right now." If you have cleanup, and you have forcible shutdown, then you have everything that's needed to support node-core streams, and it turns out those things are also necessary in many userland cases as well. |
Here's a spec that I think can work for node-core and is clear enough to be used for userland programs: https://gist.github.com/1241393 I don't think that What do you all think? |
I approve, except for maybe or it needs to be possible to handle errors, for Filter and Duplex streams, |
@isaacs i think we can add buffer as we clean up the current functionality, it's already written :) i agree it's a good idea in general though to clean up what we have and stay away from too many additional extensions, I just have this rather large issue right now and have already gone through the trouble of fixing it :) @dominictarr i'm mulling over your error comment.... it's a very new API but it is definitely more consistent with how we currently deal with reverse event propogation. @isaacs the reason I make this a branch and a pull request is to keep the conversation about something a little more concrete. we've endlessly discussed the stream spec for a year and have made relatively little progress. I'm going to merge in your new Stream spec and replace the Stream spec in the docs in my branch so that it can become part of the conversation, and i'll probably start implementing part of it as well. @dominictarr i owe you, and everyone really, a blog post about capability detection in stream pipes so that you call can understand what the fuck i'm talking about. |
what do you think about removing the distinction in the spec between a readable and a writable stream? it's really confusing for people implementing readable/writable streams. we have these direct method to event correlations that we can't actually outline in the spec because we're separated the two objects. |
what do you think about removing all the callback arguments, for now. do we have a really strong userland case for them at the moment? it's a lot of extra code, and a fairly large new feature. if the primary goal is to clean up what is there I think we should tackle this in a future release. |
That's because we were discussing it, and not writing anything down, and then writing a bunch of code based on our meatware records of the discussion, and then dealing with bugs in said code, and then repeating. There are enough people now with enough experience dealing with these things that we should be able to understand the issues. The best way to build an interoperable thing is for people with that understanding to sit down and write out a spec in english, and then adjust it when the code doesn't fit it. There are too many edge cases in real-life streams to do it any other way.
Well, it's pretty significant. If a stream is readable, then data comes out of it, and if it's writable, then data goes into it. Knowing which set of capabilities it has is important and helpful.
A Same for the callbacks to
and
If you can construct the
You need to stop focusing on filters exclusively. Remember, for a duplex stream, there is not necessarily any event that corresponds to a write() call. In fact, even for some filters, this is the case. A deflate filter might get several megabytes written to it, in tiny chunks, and then emit a single data event. A sha filter will take an unlimited number of The write cb simply means, "Whatever it is that
It's relevant for Socket.io's userland streams, or anything that dnode talks over. I don't think this will be a success unless it can support both filters, one-way streams, and duplexes. We need to stop asking whether something is "only" needed for net and tls, and instead ask whether something is needed for duplexes, filters, or one-way streams. If the answer is "yes", then it's needed for Streams, period. But, "allowHalfDuplex" is a wonky networking term that people might not get. Maybe Basically, the semantic here is: "For this two-way stream, should it be forcibly destroyed once the last bit of data is finished being written (which requires a write() cb, btw), or should it stay open to continue getting data events."
Right. We just make it so that the pipe itself doesn't call Here's an example that should probably be a section in the "Using streams" part: // frame the file in a header and footer
header.pipe(output, { end: false })
header.on("end", function () {
body.pipe(output, {end: false})
})
body.on("end", function () {
footer.pipe(output)
}) You could also build something like this, where a tcp socket connection gets multiple files piped to it as the user requests them: net.createServer(function (sock) {
var buf = ""
var files = []
var sending = false
sock.on("data", function (c) {
buf += c
buf = buf.replace(/\n+/, "\n")
if (buf.indexOf("\n") != -1) sendFiles()
})
function sendFiles () {
var f = buf.split("\n")
buf = f.pop()
files.push.apply(files, f)
sendFile()
}
function sendFile () {
if (sending) return
var file = files.shift()
if (!file) {
sending = false
return
}
var s = fs.createReadStream(file)
s.pipe(sock, {end: false})
s.on("end", sendFile)
}
}) If the read streams didn't emit "end", then that sort of thing wouldn't be possible.
Then that breaks the guarantee that read streams are guaranteed to emit "end" eventually.
How? It's a synchronous call in
It seems to me that this is actually something that would have to be handled by the EventEmitter class, or at least the Stream base class. We could also maybe get around it by using @dominictarr's suggestion of replacing the Here's a non-exhaustive list of constraints that I've got so far on this API:
So, the question is, what is the minimum change to the API that gets us to those constraints? |
Or rather the interface is simply too complex and no one feels good about setting it in stone. |
I agree with @ry that the interface is very complex. The approach that @isaacs is taking is a sound one. We'll need to run the spec by a few different use cases to make sure it is complete though. I've done quite a bit of work on Strata's BufferedStream implementation this weekend and today. For the purposes of Strata (a web framework) it works very well. I have already implemented subclasses for both gzip and jsonp encoding which are very small and which help prove the usefulness of the approach I'm taking. |
Ok, this is where I think we're at with the encoding issues. write(chunk, encoding) can actually break utf8. treating setEncoding as a mutation on streams that just call buffer.toString() also breaks utf8 in some cases. it seems like the best thing to do is remove encoding as a param to write, stick it on readable streams, and propagate the call through the pipe chain with a "setEncoding" event in filter streams.. |
streams will mutate data, that is expected. requiring the mutation to only encode in to buffers and to always hold the right number of bytes to decode properly isn't acceptable for authors of those streams. |
yeah, I'm with mikeal on this one, a stream needs to know what the upstream thought the encoding is, but it is possible that a filter stream may change that, so if there is encoding metadata, ohh... now I've done it. what about stream metadata? http headers also seem like stream metadata, or am I crazy? |
Encoding needs to be handled on a case-by-case basis. We should expose a filter interface that can safely decode buffers to strings according to an encoding. Encoding needs to be apart from the Stream interface. |
Perhaps we should start working on some concrete test cases. "buffer.toString() breaks utf-8". Seems pretty simple to mock up. "streams need to know about upstream encoding". Probably. What does that look like? I think you guys are one the right track with an open spec + a developing implementation. Running code that exercises some of these points of discussion would complete the trifecta, no? I'll try to help out soon. |
How about string encoder/decoder as a filter? https://gist.github.com/1295138 example: fs.createReadStream('input.txt.gz')
.pipe(zlib.createGunzip())
.pipe(utf8Decoder()) // buffer to string
.pipe(new RegExpFilter(/node.js/i, 'node'))
.pipe(utf8Encoder()) // string to buffer
.pipe(zlib.createGzip())
.pipe(fs.createWriteStream('output.txt.gz')); |
+1 |
+1 A filter stream seems very sensible here. |
I think we already agreed that setEncoding needs to stay in. In fact, i think we have agreement that setEncoding stays in and the encoding argument to write goes away. Doing everything with filters is problematic the same way buffering with filters is problematic. It's too common of a case and is likely to be needed as a feature on all core streams so it'll break capability detection. |
Ok, time to trim this down to something we might merge before 0.6. @isaacs what are you thoughts? I would love to remove close() before this release but I doubt that we have time to fix all the core streams. I would like the new passthrough Stream, createFilter and .buffer() to go in and they are implemented, although we do need tests. Also, do we have time to insure that write() and end() throw on error in time? I'd really like to get that in. |
Fine with me, but I'd still love StreamDecoder to become public and act as a stream filter, it would be a nice API. |
As cute as it would be to use filters to decode/encode buffers to strings (and it wouldn't be hard to do using Close needs to remain in, at least for TCP streams, otherwise there's no way to say "end my side, but as soon as that's done, I'm going to destroy it, so don't bother calling the shutdown syscall". Also, we need to work out the issue with cyclical streams in order for error propagation to not be troublesome. At the very least, even if we make no changes to any of the core streams apis, we need to fix it so that http/https streams can never ever throw on write() or end(), because that noise is just plain annoying. |
On Oct 19, 2011, at 8:26 PM, "Isaac Z. Schlueter"reply@reply.github.com wrote:
never throw? I thought we wanted the opposite, to throw we the stream was already ended?
|
@mikeal oh, sorry, yes, it should throw if you .write() after you .end(), but this thing where it just throws randomly if the underlying socket dies, that sucks a lot. |
@isaacs if the socket is closed, because it's dead, don't we want it to throw if you attempt to write to it? |
If the Stream throws on The case of a Just my $.02. But with inflation these days... |
Netty's approach to encoding and decoding is possibly worth looking at. For example, transparent encoding of input to strings merely involves adding a pipeline.addLast("stringEncoder", new StringEncoder(CharsetUtil.UTF_8)); Similarly, decoding a stream uses a pipeline.addLast("stringDecoder", new StringDecoder(CharsetUtil.UTF_8)); Clarifying and enhancing the stream spec would also make it easier to build protocol encoders and decoders while shielding app code from their complexity.
|
Related issues:
|
Added a comment to the gist design doc:
There are types of streams, primarily communications like sockets or serialports, where an error condition such as lost connection or reset remote devices is not detected until the write is attempted. In this case, imho write should throw and then be followed by an end. |
In reading through the streams2 spec, I don't see any guidance on how a stream's ctor or open event should work. Some stream types, such as node-serialport and socket connections have an async open and shouldn't be written or read until the open has completed. How do you feel this should fit into the stream2 spec? See issue #60 in node-serialport for related discussion. |
you can pretty much write every to every writable stream as soon as it's created. |
we're going down a very different road for 0.8 streams. closing this out. |
Where's the discussion on that? |