Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extensions need to define how their parameters convert to/from CBOR #626

Closed
jcjones opened this issue Oct 11, 2017 · 27 comments · Fixed by #765
Closed

Extensions need to define how their parameters convert to/from CBOR #626

jcjones opened this issue Oct 11, 2017 · 27 comments · Fixed by #765

Comments

@jcjones
Copy link
Contributor

jcjones commented Oct 11, 2017

Extension definitions need to be more thorough, as @bzbarsky points out in the discussion for issue #624. In particular, for each extension -- including the example -- we need two sections:

  1. How to encode the AuthenticationExtensions data structure into CBOR for the wire protocol
  2. How to decode the resulting extension CBOR into the AuthenticationExtensions data structure

Without these sections, it's under-specified how to actually implement these extensions.

@jyasskin
Copy link
Member

Most of the extensions specify the AuthenticationExtensions -> CBOR conversion in "Authenticator extension input" and the CBOR->AuthenticationExtensions conversion in "Client extension output". The specifications aren't perfect, but they're present in most or all cases.

@bzbarsky
Copy link

Those specifications are pretty useless, sorry. The first example I came to: https://w3c.github.io/webauthn/#sctn-simple-txauth-extension says:

The client extension input encoded as a CBOR text string (major type 3).

What does that mean? The input is an ES Value, right? Is https://tc39.github.io/ecma262/#sec-tostring used? Something else? If it's ToString, that can have side-effects, so the exact timing of the operation needs to be defined.

Then once you've got an ES String, you have a sequence of 16-bit integers. "major type 3" in RFC 7049 is a string of "Unicode characters encoded as UTF-8". Is the sequence of 16-bit integers meant to be treated as UTF-16? If so, how are invalid codepoints handled? All of this needs to be specified and it's not.

And this is the simplest case. Cases like https://w3c.github.io/webauthn/#sctn-generic-txauth-extension are completely unspecified in terms of how one goes from an ES Value to something that can be encoded in CBOR. And this is all observable, because there are side-effects everywhere, because ES.

@selfissued
Copy link
Contributor

It would be more useful to have people file individual issues against specific language in specific extensions - preferably with proposed replacement text. Then we can quickly address the particular wording problems identified.

@bzbarsky
Copy link

It's not a wording problem; it's a "the behavior is not defined" problem. I, personally, can't propose replacement text, because I can't tell what behavior is intended.

Please don't make this sound like a minor editorial problem; it's not.

@jyasskin
Copy link
Member

There is a systematic problem here, that our types and conversions aren't precisely specified.

I disagree that this makes the specifications "useless". Certainly a hostile implementer could do the wrong thing, and we should fix that, but a benign implementer is going to reach for "utf-8 encode" when converting "A single JSON string prompt." (which should probably be a DOMString) to "a CBOR text string" (i.e. UTF-8).

We're likely to end up with some ordering inconsistencies in the more complex conversions, but HTML4 wasn't "useless" despite having lots of that sort of problem.

@bzbarsky
Copy link

bzbarsky commented Oct 17, 2017

but a benign implementer is going to reach for "utf-8 encode"

"utf-8 encode" takes a code point stream as input, but neither an ES String nor a DOMString represents a "code point stream", though it could be forced to do so by making some assumptions about what the numbers in it mean... [Edit: I had something here about non-UTF-8 bits, which is obviously wrong now that I've paged back in all the context again.]

If the intent is that https://heycam.github.io/webidl/#es-to-USVString happen here, followed by "utf-8 encode", great, please spec that. If the intent is something else, spec that. But right now it's really not clear what to do if the input contains an integer corresponding to an unpaired surrogate if it's intepreted as UTF-16. And, again, it's not clear whether input like { toString: function() { return "hey" } } is supposed to encode "hey" or not.

We're likely to end up with some ordering inconsistencies in the more complex conversions

I don't think it's just ordering inconsistencies... For the second thing I linked to above, for example, ES primitive values may or may not be valid input.

but HTML4 wasn't "useless"

It really was, for achieving interoperability.... What interoperability emerged was the result of heroic reverse-engineering efforts. I'm really hoping we have actually learned something about not writing specifications like that, since then.

Anyway, we're arguing in circles, and I'm not going to spend any more time arguing this point; it's not my job, and I've already spent far too much of my personal time on this issue. Either the spec will get fixed, or it won't and then it won't be possible to write tests that would demonstrate interop. I obviously reserve the right to raise formal objections if the spec tries to go to CR or later with these sorts of issues, and without a test suite that would catch them.

@selfissued
Copy link
Contributor

@bzbarsky I wasn't attempting to minimize the issue. What I was primarily asking was for individual issues to be filed against the specific text locations that you find to be ambiguous so that we can work down the list and fix each of them. We may want to assign different people who are experts in the topics of the different extensions to fix them, which isn't possible with the current blanket issue referring to all the extensions.

Yes, if you know what it should say in a particular case, please include that in the issue, but my main point was to individually flag each of the places where there is ambiguity.

@bzbarsky
Copy link

OK. If I end up with more time to work on this, and no one else has filed those issues, I may try to do that. But note that you may want to end up defining some centralized processing concepts that all extensions share.

@nadalin
Copy link
Contributor

nadalin commented Nov 9, 2017

@gmandyam Change to use buffer source and use CDDL and remove client data examples

@jcjones
Copy link
Contributor Author

jcjones commented Nov 9, 2017

This one is just the BufferSource change, and this needs to happen before CR. I am filing an issue for adding CDDL to each extensions.

@jcjones
Copy link
Contributor Author

jcjones commented Nov 9, 2017

Opened #679

@nadalin
Copy link
Contributor

nadalin commented Nov 29, 2017

@jcjones will create PR and @selfissued will review

@jcjones
Copy link
Contributor Author

jcjones commented Jan 3, 2018

@selfissued
Copy link
Contributor

PR #737 fixes issue #725

@selfissued
Copy link
Contributor

Having just skimmed through all the defined extensions, I believe that all the extensions now do define how their parameters convert to/from CBOR (once #737 is merged, that is).

@bzbarsky
Copy link

bzbarsky commented Jan 8, 2018

@selfissued Assuming the relevant bits except #737 are now merged into https://w3c.github.io/webauthn/, I am still not clear on how this works.

Let's take a simple example just to make sure I undestand. For https://w3c.github.io/webauthn/#sctn-simple-txauth-extension, which values, if any, come from JS via an AuthenticationExtensions record?

@selfissued
Copy link
Contributor

The client extension input value, which for this extension, is "A single JSON string prompt", is the value for the "txAuthSimple" member of the AuthenticationExtensions record. The authenticator extension input value is the CBOR encoding of the client extension input value - which is the prompt string.

Upon return, the authenticator extension output value is transformed into the client extension output value.

This is the usual pattern.

@bzbarsky
Copy link

bzbarsky commented Jan 9, 2018

OK, so... The thing in the AuthenicationExtensions is an any. That means that it can be any JS value.

  1. What defines how arbitrary JS values are transformed into "A single JSON string prompt"? If I pass in { toString: function() { return 'x'; }, valueOf: function() { return 'y'; } } as the value in the AuthenticationExtensions, then what will the resulting "JSON string prompt" be and why?

  2. The concept of "JSON string" is not linked here. Does it mean https://tools.ietf.org/html/rfc8259#section-7 or something else? Note that encoding of a given Unicode string into the format described here is not unique. Also, does the "JSON string" form include the opening and closing " in this case?

  3. What happens to unpaired surrogates? They're not "Unicode characters" per the Unicode definitions, so it's not clear to me that their representation in a "JSON string" as defined in https://tools.ietf.org/html/rfc8259#section-7 is actually defined.

Basically, none of my questions from #626 (comment) seem to be answered in even this simplest case, unless I'm really missing something.

@emlun
Copy link
Member

emlun commented Jan 9, 2018

As I read the spec, if you were to pass extensions: { "txAuthSimple": foo } where foo is anything else than a JS string value, the client (if it supports extensions) would throw an error.

@bzbarsky
Copy link

bzbarsky commented Jan 9, 2018

Where does the spec say that, exactly? It needs to be very clearly spelled out, especially because that's not normal behavior for web APIs...

And even if this part were explicitly specified, that answers only one of the three questions I mention in #626 (comment)

@emlun
Copy link
Member

emlun commented Jan 9, 2018

I confess it doesn't, as far as I can tell. I agree that should be clearly spelled out.

@jyasskin
Copy link
Member

jyasskin commented Jan 9, 2018

Using the partial dictionary option suggested in #346 instead of a record<DOMString, any> would help with the conversions here. That wouldn't get the behavior @emlun suggested of throwing on non-string inputs, but it would be more consistent with the rest of the web.

@selfissued
Copy link
Contributor

I think I'm fine with the partial dictionary approach for client extension input and output values as described in #346 since the ttypedef record<DOMString, any>AuthenticationExtensions; approach seems to be syntactically problematic. That's far preferable to forcing everything to CBOR when it's a Web interface. What do others think of this approach?

@selfissued
Copy link
Contributor

Answering @bzbarsky 's questions:

  1. Arbitrary JS values are not transformed into strings. You're expected to pass a string value for this extension. If you don't, it's an error.
  2. I think that JSON string should be DOMstring
  3. Unpaired surrogate processing would be the same as for an other Web interface using strings. There's nothing about this use case that requires custom Unicode processing.
  4. Where were are the conversions from client extension inputs to authenticator extension inputs specified and where are the reverse conversions from authenticator extension outputs to client extension outputs defined? In the last paragraph of https://w3c.github.io/webauthn/#sctn-client-extension-processing .

@bzbarsky
Copy link

bzbarsky commented Jan 9, 2018

Arbitrary JS values are not transformed into strings

OK. That includes String objects?

And again, this needs to be clearly defined somewhere.

I think that JSON string should be DOMstring

Fine, so array of 16-bit integers....

Unpaired surrogate processing would be the same as for an other Web interface using strings.

Mostly they pass them through unchanged, when using DOMString (which isn't a string of Unicode characters, name notwithstanding; it's an array of 16-bit integers). But CBOR major type 3 requires input be a Unicode string afaict, so you need to produce a Unicode string somewhere here. Again, a DOMString is not a Unicode string.

You could do a USVString instead of DOMString. That would at least give you a Unicode string and clearly define how you produce it from whatever you're handed. Unpaired surrogates would be replaced with U+FFFD per https://heycam.github.io/webidl/#dfn-obtain-unicode. Again, I'm just repeating things I already said in #626 (comment)

@selfissued
Copy link
Contributor

I'm fine using USVstring, as @bzbarsky suggests.

@selfissued
Copy link
Contributor

Mike will create a PR using partial dictionaries as suggested in #346 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment