Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TextEncoderStream and TextDecoderStream #282

Closed
3 of 5 tasks
ricea opened this issue May 22, 2018 · 16 comments
Closed
3 of 5 tasks

TextEncoderStream and TextDecoderStream #282

ricea opened this issue May 22, 2018 · 16 comments
Assignees
Labels
Progress: review complete Resolution: satisfied The TAG is satisfied with this design Review type: later review Topic: Streams Any kind of stream-like feature Topic: universal JavaScript Features that work on the web and non-web (e.g. node.js) Venue: WHATWG

Comments

@ricea
Copy link

ricea commented May 22, 2018

Hello TAG!

I'm requesting a TAG review of:

Further details (optional):

You should also know that...

This is an extension to the existing Encoding Standard to integrate with the Streams Standard. There has been extensive discussion at whatwg/encoding#72.

We'd prefer the TAG provide feedback as (please select one):

  • open issues in our Github repo for each point of feedback
  • open a single issue in our Github repo for the entire review
  • leave review feedback as a comment in this issue and @-notify [github usernames]
@slightlyoff
Copy link
Member

Hey Adam,

Thanks for filing for a review, and apologies for the delay in getting back to you.

Some questions regarding the merged PR:

Regards

@slightlyoff slightlyoff added Progress: pending external feedback The TAG is waiting on response to comments/questions asked by the TAG during the review and removed extra time labels Oct 30, 2018
@annevk
Copy link
Member

annevk commented Oct 30, 2018

To answer the first question, other encodings don't have a BOM. If you encoded U+FEFF with, e.g., gb18030 it should never be removed.

@slightlyoff
Copy link
Member

Thanks, @annevk! Is there an equivalent mechanism to handle potentially special characters in other encodings that could be pluggable?

@annevk
Copy link
Member

annevk commented Oct 30, 2018

No such characters exist to my knowledge and we intentionally don't want text encodings to be generic, we want everyone to migrate to UTF-8.

@ricea
Copy link
Author

ricea commented Oct 30, 2018

The GenericTransformStream mixin creates readable and writeable properties, but it doesn't appears as though internal aspects of these default streams are configurable. That is, the various flags one can pass to Stream constructors aren't available for configuration when instantiating one of these stream types. This makes certain built-in stream types "magically" faster and perhaps more flexible than userland stream types, violating layering expectations. Perhaps there's a way to configure them that I've just missed?

I'm not sure I understand the question correctly. They're not magically faster. Implementations can take advantage of the fact that the internals are not visible to user JS to optimise these operations. But they wouldn't be faster because of features unavailable to JS; they'd be faster because of the absence of the dynamic behaviour provided by JS.

They're also not more flexible. A transform stream is just a readable and a writable glued together, and that is what GenericTransformStream is trying to express. That glue can be written in JS or anything else.

The polyfill at https://github.com/GoogleChromeLabs/text-encode-transform-polyfill has full feature parity with Chrome's implementation, which I think demonstrates that layering has not been broken.

The queuing strategies with which the streams are constructed are intentionally not configurable. To avoid adding buffer bloat to pipes, I made the queues as small as possible. My philosophy is that queuing strategies should be designed together with the transformer, as mismatches can have surprising side-effects.

I'm not able to locate much in the way of example code outside of the encoding/streams/ WPT tests. There isn't a ton in the Explainer either. It'd be helpful to see examples that show problems being solved rather than just "how to use it here".

The first three demos at https://streams.spec.whatwg.org/demos/ use a polyfill of this functionality. Unfortunately it is out-of-date and doesn't match the final API. I am planning to update the demos to use the new polyfill, after which I can link them from the explainer.

It would good to have a longer example in the standard as well, maybe with the other long examples at the start of section 7. @annevk, what do you think?

The Explainer states that it's a non-goal to provide encodings other than UTF-8. Is it possible to create your own subtype to handle other encodings? If not, why not?

You can create a TransformStream for any encoding you want, but you have to provide your own implementation. Encoding to encodings other than UTF-8 is intentionally not exposed to the web platform. It wouldn't be a subtype of TextEncoderStream, but I don't know why you'd want to do that anyway.

@annevk
Copy link
Member

annevk commented Oct 30, 2018

(I'm always supportive of more examples being added.)

@slightlyoff
Copy link
Member

Thanks all, and sorry for the slow replies.

@ricea:

So pulling on theGenericTransformStream thread and configurability, there's a lot of stuff that these designs inadvertantly expose:

  • the association with the implicit transformer means that something that's participating in this hierarchy has to somehow implement the magical isTransformStream. This type can't be meaninfully subclassed in script. Why is this hierarchy welded shut? What are the extension points to make a BYO transform stream type?
  • the choices you're making about buffer sizes might be right, but sophisticated users might take different views. Where are the extension points for those?
  • Not having configurable queueing strategies might, again, be the right default choice. But always? In all cases? What about for other types?
  • Not having up-to-date example code is a pretty bad smell. Can you ping this issue again when things are updated, preferably in the Explainer?

Regards

@domenic
Copy link
Member

domenic commented Nov 28, 2018

the association with the implicit transformer means that something that's participating in this hierarchy has to somehow implement the magical isTransformStream. This type can't be meaninfully subclassed in script. Why is this hierarchy welded shut? What are the extension points to make a BYO transform stream type?

This isn't correct. You can easily subclass TransformStream; try it in your browser to see! The hierarchy is open to anyone to extend.

There's nothing magical about IsTransformStream. It tests whether you are a TransformStream or subclass of TransformStream. Furthermore, it is only used within the getters of TransformStream itself, to ensure you can't do something like Object.getOwnPropertyDescriptor(TransformStream, "readable").get.call(someRandomObject) and have that poke at the internal memory addresses a browser uses to store the [[readable]] internal slot of TransformStream objects, offset from an arbitrary object address. That would indeed be bad to allow.

@ricea
Copy link
Author

ricea commented Nov 29, 2018

This type can't be meaninfully subclassed in script.

By "this type", do you mean TransformStreamDefaultController? It can't be meaningfully subclassed, because it is only useful when it is associated with a TransformStream object.

What are the extension points to make a BYO transform stream type?

A transform stream is simply an object that has a readable property that is a ReadableStream and a writable property that is a WritableStream. You can create one however you like. The TransformStream constructor provides one convenient method to do so.

the choices you're making about buffer sizes might be right, but sophisticated users might take different views. Where are the extension points for those?

You can always add buffering to a pipe using identity transforms. For example, suppose you want a megabyte of buffering in front of a TextDecoderStream:

const bufferingIdentityTransform =
    new TransformStream({}, new ByteLengthQueuingStrategy({highWaterMark: 1024 * 1024}));

response.body
  .pipeThrough(bufferingIdentityTransform)
  .pipeThrough(new TextDecoderStream())
  .pipeTo(mySink);

Not having configurable queueing strategies might, again, be the right default choice. But always? In all cases? What about for other types?

There might be other platform transforms for which a configurable queuing strategy would make sense. There would be a price to pay in that the user-supplied size() function would be called inside the implementation of the transform, precluding some optimisations and introducing risks of re-entrancy. For some transforms that trade-off might be worthwhile.

Not having up-to-date example code is a pretty bad smell.

Indeed. I will let you know when they are updated.

@slightlyoff
Copy link
Member

Thanks for the note on subclassing TransformStream, @domenic! Glad to see it works; I was more concerned with TransformStreamDefaultController, tho. Do you think the arguments here around not enabling subclassing of it hold water?

@ricea: yes, I was surprised to see:

class TransformStreamDefaultController {
  constructor() // always throws
  ...

Given what you've shown above re: identity streams, what's the argument for preventing subclassing?

On updated, examples, thanks. Looking forward to it.

@ricea
Copy link
Author

ricea commented Dec 5, 2018

Given what you've shown above re: identity streams, what's the argument for preventing subclassing?

A TransformStreamDefaultController and a TransformStream are mutually associated for the lifetime of the TransformStream. TransformStreamDefaultController objects are only created during construction of TransformStream objects. A TransformStreamDefaultController that was not associated with a TransformStream would be useless. @domenic wrote more about the principles of the design in The Revealing Constructor Pattern.

Suppose hypothetically that we permitted constructing a "detached" TransformStreamDefaultController. We could add a way to pass it into the TransformStream constructor to have it "attached" to the newly-constructed TransformStream. What extra capabilities would this afford the web developer? They could subclass TransformStreamDefaultController and override enqueue(), error() and terminate() methods. However, they are the only person who will be calling enqueue(), error() and terminate(). They control all the call sites, so changing the call sites would be a much simpler way to accomplish the same thing. If they didn't want to change the call sites, they could wrap the controller object passed to their start() method, or modify the controller object directly.

So preventing subclassing reduces complexity with no loss of generality.

@kenchris
Copy link

@domenic if you are supposed to subclass TransformStream shouldn't the examples in the stream spec do that? eg. https://streams.spec.whatwg.org/#example-ts-lipfuzz

@domenic
Copy link
Member

domenic commented Dec 11, 2018

You're not supposed to; you can. Sometimes it's useful, like when you want to add extra properties and methods! In those examples it's not, because we just want to transform some data.

@slightlyoff
Copy link
Member

The argument that "we can't figure out why a web developer would ever want something" (a something that works against our design principles and general design goals) is not particularly compelling. The point of extensibility is to enable developers to do the set of things we haven't yet thought of, rather than to presume that we're omniscient.

The argument from consistency, at least, trumps one-off restrictions and design oddities. Let's make this subclassable.

@domenic
Copy link
Member

domenic commented Jan 8, 2019

I think you are over-applying a general principle, which doesn't make sense in the case of the revealing constructor pattern. Those revealed capabilities aren't really a proper "class"; they're a holder for some capabilities of another class (in this case TransformStream), which itself is subclassable and extensible in the desired way.

This ties in to how you have misquoted @ricea's argument. It isn't "we can't figure out why a web developer would want to do something". It's "preventing subclassing reduces complexity with no loss of generality": i.e., anything a web developer might want to do, they can do in a much simpler way with other techniques.

@travisleithead travisleithead removed their assignment Feb 5, 2019
@hober hober removed the extra time label May 21, 2019
@kenchris kenchris added Progress: in progress Review type: later review Venue: WHATWG and removed Progress: pending external feedback The TAG is waiting on response to comments/questions asked by the TAG during the review labels May 21, 2019
@cynthia cynthia added Review type: CG early review An early review of general direction from a Community Group Topic: universal JavaScript Features that work on the web and non-web (e.g. node.js) Topic: Streams Any kind of stream-like feature and removed Review type: CG early review An early review of general direction from a Community Group labels May 21, 2019
@cynthia cynthia removed this from the 2019-02-05-f2f milestone May 21, 2019
@cynthia
Copy link
Member

cynthia commented May 21, 2019

@kenchris and I discussed during the Iceland F2F.

As we see it, the fact on whether or not we force inheritance in this particular case is not a significant blocker and we'd like to see this feature move forward. Thanks for bringing this to our attention and bearing with us for the extremely long discussion.

@cynthia cynthia closed this as completed May 21, 2019
@cynthia cynthia added Progress: review complete Resolution: satisfied The TAG is satisfied with this design and removed Progress: in progress labels May 21, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Progress: review complete Resolution: satisfied The TAG is satisfied with this design Review type: later review Topic: Streams Any kind of stream-like feature Topic: universal JavaScript Features that work on the web and non-web (e.g. node.js) Venue: WHATWG
Projects
None yet
Development

No branches or pull requests

10 participants