From b4a1154894975c0d50ec34b43b26f2bab8eb30c9 Mon Sep 17 00:00:00 2001 From: Anne van Kesteren Date: Fri, 6 Oct 2017 15:09:16 +0200 Subject: [PATCH 01/18] Revamp MIME type section --- mimesniff.bs | 363 ++++++++------------------------------------------- 1 file changed, 54 insertions(+), 309 deletions(-) diff --git a/mimesniff.bs b/mimesniff.bs index 8dce0a1..f34e444 100644 --- a/mimesniff.bs +++ b/mimesniff.bs @@ -201,349 +201,94 @@ Indent: 1

Parsing a MIME type

-

- To parse a MIME type, the user agent must execute the following - steps: - -

    -
  1. - Let sequence be the byte sequence of the MIME - type, where sequence[s] is byte - s in sequence and sequence[0] is the first - byte in sequence. +

    To parse a MIME type, given a string input, run these steps: -

  2. - If the number of bytes in sequence is - less than 1, return undefined. - -
  3. - Initialize s to 0. - -
  4. - Initialize type and subtype to the empty string - (""). +
      +
    1. Remove any leading and trailing ASCII whitespace from input. -

    2. - Initialize parameters to the empty dictionary ({}). +
    3. If input is the empty string, then return missing. -

    4. - While sequence[s] is ASCII whitespace, - continuously execute the following steps: - -
        -
      1. - Increment s by 1. +
      2. Let position be a position variable for input, initially pointing at + the start of input. -

      3. - If sequence[s] is undefined, return undefined. -
      +
    5. Let type be the result of collecting a sequence of code points that are + not U+002F (/) from input, given position. -

    6. - Initialize t to 0. +
    7. If position is past the end of input, then return failure. -

    8. - While sequence[s] is not equal to the U+002F SOLIDUS - character ("/"), continuously execute the - following steps: +
    9. Advance position to the next code point in input. (This skips + past U+002F (/).) -

        -
      1. - If t is greater than 127, return undefined. +
      2. Let subtype be the result of collecting a sequence of code points that are + not U+003B (;) from input, given position. -

      3. - If sequence[s] is undefined, return undefined. +
      4. Remove any trailing ASCII whitespace from subtype. -

      5. - Append sequence[s], ASCII lowercased, to type. +
      6. If subtype is the empty string, then return failure. -

      7. - Increment s and t by 1. -
      +
    10. Let mimeType be a new MIME type record whose type + is type, in ASCII lowercase, and subtype is + subtype, in ASCII lowercase. -

    11. - Increment s by 1. +
    12. +

      While position is not past the end of input: -

    13. - Initialize u to 0. +
        +
      1. Advance position to the next code point in input. (This skips + past U+003B (;).) -

      2. - While sequence[s] is not ASCII whitespace - and is not equal to the U+003B SEMICOLON character - (";"), continuously execute the following steps: +
      3. Skip ASCII whitespace within input given position. -

          -
        1. - If u is greater than 127, return undefined. +
        2. Let parameterName be the result of collecting a sequence of code points + that are not U+003D (=) from input, given position. -

        3. - If sequence[s] is undefined, return - type, subtype, and parameters. +
        4. If position is not past the end of input, then advance + position to the next code point in input. (This skips past + U+003D (=).) -

        5. - Append sequence[s], ASCII lowercased, to subtype. +
        6. Skip ASCII whitespace within input given position. + -

        7. - Increment s and u by 1. -
        +
      4. Let parameterValue be null.

      5. - Enter loop L: +

        If position is not past the end of input, then:

        1. - Enter loop M: +

          If the current code point in input is U+0022 ("), then advance + position to the next code point in input and:

            -
          1. - If sequence[s] is undefined or is equal to the - U+003B SEMICOLON character (";"), exit loop - M. - -
          2. - While sequence[s] is ASCII whitespace, continuously execute the - following steps: - -
              -
            1. - Increment s by 1. -
            - -
          3. - If sequence[s] is equal to the U+0022 QUOTATION - MARK character ("""), execute the following - steps: - -
              -
            1. - Increment s by 1. - -
            2. - Enter loop N: - -
                -
              1. - If sequence[s] is undefined or is equal to - the U+0022 QUOTATION MARK character - ("""), execute the following steps: - -
                  -
                1. - If sequence[s] is equal to the U+0022 - QUOTATION MARK character ("""), - increment s by 1. - -
                2. - Exit loop N. -
                - -
              2. - If sequence[s] is equal to the U+005C - REVERSE SOLIDUS character ("\") and - sequence[s + 1] is not undefined, increment - s by 1. - -
              3. - Increment s by 1. -
              -
            - - Otherwise, enter loop N: - -
              -
            1. - If sequence[s] is undefined or is - ASCII whitespace or is equal to the U+003B - SEMICOLON character (";"), exit loop - N. - -
            2. - Increment s by 1. -
            -
          +
        2. Set parameterValue to the result of collecting a sequence of code + points that are not U+0022 (") from input, given position. -

        3. - If sequence[s] is undefined, return - type, subtype, and parameters. - -
        4. - Increment s by 1. - -
        5. - While sequence[s] is ASCII whitespace, continuously execute the - following steps: - -
            -
          1. - Increment s by 1. +
          2. If position is not past the end of input, then collect a + sequence of code points that are not U+003B (;) from input, given + position. +

        6. - Initialize name and extra to the empty string - (""). - -
        7. - Initialize p to 0. - -
        8. - Enter loop M: +

          Otherwise:

            -
          1. - Append extra to name. - -
          2. - While sequence[s] is not ASCII whitespace and is not equal to - the U+003D EQUALS SIGN character ("="), continuously execute the following - steps: - -
              -
            1. - If p is greater than 127, return undefined. - -
            2. - If sequence[s] is undefined, execute the - following steps: - -
                -
              1. - If name is not equal to the empty string - ("") and parameters[name] - is undefined, set parameters[name] to null. - -
              2. - Return type, subtype, and - parameters. -
              - -
            3. - Append sequence[s], ASCII lowercased, to name. - -
            4. - Increment s and p by 1. -
            - -
          3. - While sequence[s] is ASCII whitespace, continuously execute the - following steps: - -
              -
            1. - Append sequence[s] to extra. - -
            2. - Increment s and p by 1. -
            - -
          4. - If sequence[s] is equal to the U+003D EQUALS - SIGN character ("="), exit loop - M. -
          +
        9. Set parameterValue to the result of collecting a sequence of code + points that are not U+003B (;) from input, given position. -

        10. - Increment s by 1. - -
        11. - Initialize parameters[name] to null. - -
        12. - While sequence[s] is ASCII whitespace, continuously execute the - following steps: - -
            -
          1. - Increment s by 1. -
          - -
        13. - Initialize value to the empty string (""). - -
        14. - If sequence[s] is undefined, execute the following - steps: - -
            -
          1. - Set parameters[name] to value. - -
          2. - Return type, subtype, and parameters. -
          - -
        15. - If sequence[s] is equal to the U+0022 QUOTATION - MARK character ("""), execute the following - steps: - -
            -
          1. - Increment s by 1. - -
          2. - Enter loop M: - -
              -
            1. - If sequence[s] is undefined or is equal to the - U+0022 QUOTATION MARK character ("""), - execute the following steps: - -
                -
              1. - Set parameters[name] to value. - -
              2. - If sequence[s] is equal to the U+0022 - QUOTATION MARK character ("""), - increment s by 1. - -
              3. - Exit loop M. -
              - -
            2. - If sequence[s] is equal to the U+005C REVERSE - SOLIDUS character ("\") and - sequence[s + 1] is not undefined, increment - s by 1. - -
            3. - Append sequence[s] to value. - -
            4. - Increment s by 1. -
            -
          - - Otherwise, enter loop M: - -
            -
          1. - If sequence[s] is undefined or is - ASCII whitespace or is equal to the U+003B SEMICOLON - character (";"), execute the following steps: - -
              -
            1. - Set parameters[name] to value. - -
            2. - Exit loop M. -
            - -
          2. - Append sequence[s] to value. - -
          3. - Increment s by 1. +
          4. Remove any trailing ASCII whitespace from parameterValue.

        -
      -

      - The parse a MIME type algorithm is intended to be executed after - any protocol-specific syntax within the MIME type has been - handled. +

    14. If parameterName and parameterValue are not the empty string, and + mimeType's parameters[parameterName] + does not exist, then set mimeType's + parameters[parameterName] to parameterValue. + +

    + +
  5. Return mimeType +

From d58f25cdf900b98cce26007388afa588a7e533aa Mon Sep 17 00:00:00 2001 From: Anne van Kesteren Date: Fri, 6 Oct 2017 15:36:29 +0200 Subject: [PATCH 02/18] minor fixes --- mimesniff.bs | 17 +++++++++++++---- 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/mimesniff.bs b/mimesniff.bs index f34e444..df545c4 100644 --- a/mimesniff.bs +++ b/mimesniff.bs @@ -240,11 +240,20 @@ Indent: 1
  • Skip ASCII whitespace within input given position.

  • Let parameterName be the result of collecting a sequence of code points - that are not U+003D (=) from input, given position. + that are not U+003B (;) or U+003D (=) from input, given position. -

  • If position is not past the end of input, then advance - position to the next code point in input. (This skips past - U+003D (=).) +

  • Set parameterName to parameterName, in ASCII lowercase. + +

  • +

    If position is not past the end of input, then: + +

      +
    1. If the current code point in input is U+003B (;), then + continue. + +

    2. Advance position to the next code point in input. (This + skips past U+003D (=).) +

  • Skip ASCII whitespace within input given position. From dad01646b832e316cf7de791f985a1aaf9da2e8f Mon Sep 17 00:00:00 2001 From: Anne van Kesteren Date: Tue, 10 Oct 2017 11:10:56 +0200 Subject: [PATCH 03/18] Define MIME type records and redo serialization --- mimesniff.bs | 206 +++++++++++++++------------------------------------ 1 file changed, 60 insertions(+), 146 deletions(-) diff --git a/mimesniff.bs b/mimesniff.bs index df545c4..699241e 100644 --- a/mimesniff.bs +++ b/mimesniff.bs @@ -139,63 +139,58 @@ Indent: 1 represented by ~. -

    Understanding MIME types

    +

    MIME types

    -

    - The MIME type of a resource is a technical hint about the use and format - of that resource. [[!MIMETYPE]] +

    MIME type representation

    -

    - A MIME type is sometimes called an Internet media type in protocol literature, but - consistently using the term MIME type avoids confusion with the use of "media type" as - described in the Media Queries CSS specification. [[MEDIAQUERIES-4]] +

    A MIME type represents an +internet media type as defined by +Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types. It can also be +referred to as a MIME type record. [[!MIMETYPE]] -

    - A parsable MIME type is a MIME type for which the - parse a MIME type algorithm does not return undefined. +

    Standards are encouraged to consistently use the term MIME type to avoid +confusion with the use of media type as described in Media Queries. +[[MEDIAQUERIES]] - Every parsable MIME type has a corresponding parsed MIME - type, which is the result of - parsing the parsable MIME - type. +

    A MIME type's type is a non-empty ASCII string. - A parsed MIME type is made up of a type, a - subtype, and a dictionary of parameters. +

    A MIME type's subtype is a non-empty ASCII string. -

    - A valid MIME type is a string that matches the - media-type rule defined in - section 3.1.1.1 "Media Type" of RFC - 7231. In particular, a valid MIME type may include parameters. [[!RFC7231]] +

    A MIME type's parameters is an ordered map. It is +initially empty. -

    - TODO: give an example of a string that is a parsable MIME type but not a - valid MIME type. +

    MIME type miscellaneous

    -

    - A valid MIME type with no parameters is a MIME type that does not contain - any U+003B SEMICOLON (;) characters. In other words, it consists only of a type and - subtype, with no parameters. +

    The essence of a MIME type mimeType is the +result of serializing mimeType with +exclude parameters set to true. -

    - A serialized MIME type is the result of - serializing a parsed MIME - type. +

    A MIME type is supported by the user agent if the user agent has the +capability to interpret a resource of that MIME type and present it to the user. -

    - The MIME type portion of a parsable MIME type is the - result of serializing the - type and subtype of its parsed MIME - type with null parameters. +

    This needs more work. See +w3c/preload #113. -

    - The MIME type portion of a parsable MIME type - excludes any and all parameters. -

    - A parsable MIME type is supported by the user agent - if the user agent has the capability to interpret a resource of - that MIME type and present it to the user. +

    MIME type writing

    + +

    A valid MIME type string is a string that matches the media-type +production defined in +section 3.1.1.1 "Media Type" of RFC 7231. +In particular, a valid MIME type may include parameters. [[!RFC7231]] + +

    A valid MIME type string is supposed to be used for conformance checkers only. + +

    +

    "text/html" is a valid MIME type string. + +

    "text/html;" is not a valid MIME type string, though + parse a MIME type returns a MIME type record for it identical to if the input had + been "text/html". +

    + +

    A valid MIME type string with no parameters is a valid MIME type string +that does not contain U+003B (;). @@ -300,118 +295,37 @@ Indent: 1 -

    Serializing a MIME type

    -

    - To serialize a MIME type, given a type, a - subtype, and a dictionary of parameters, execute the - following steps: - -

      -
    1. - If type is undefined, is null, is equal to the empty string - (""), or has a length greater than - 127, return undefined. - -
    2. - If subtype is undefined, is null, or has a - length greater than 127, return undefined. - -
    3. - Let serialization be the concatenation of type, the - U+002F SOLIDUS character ("/"), and - subtype. - -
    4. - If parameters is undefined or is null, return - serialization. - -
    5. - Let names be a list of the keys in parameters, - sorted ASCII - case-insensitively in ascending alphabetical order. - -
    6. - Should this special-case the "charset" or - "codecs" parameters first? +

      To serialize a MIME type, given a MIME type mimeType and an +optional boolean exclude parameters, run these steps: -

    7. - For each item name in names, execute the following - steps: - -
        -
      1. - If name has a length greater than 127, return undefined. - -
      2. - If parameters[name] is not null, execute the - following steps: - -
          -
        1. - Append the U+003B SEMICOLON character - (";") to serialization. - -
        2. - Append name, ASCII lowercased, to - serialization. - -
        3. - Append the U+003D EQUALS SIGN character - ("=") to serialization. - -
        4. - Append the U+0022 QUOTATION MARK character - (""") to serialization. - -
        5. - For each character char in - parameters[name], execute the following steps: - -
            -
          1. - If char is equal to the U+0022 QUOTATION MARK - character (""") or to the - U+005C REVERSE SOLIDUS character - ("\"), append the U+005C REVERSE SOLIDUS - character ("\") to - serialization. +
              +
            1. If exclude parameters is not given, then set it to false. -

            2. - Append char to serialization. -
            +
          2. Let serialization be the concatenation of mimeType's + type, U+002F ("/"), and mimeType's + subtype. -

          3. - Append the U+0022 QUOTATION MARK character - (""") to serialization. +
          4. If exclude parameters is true or mimeType's + parameters is empty, then return serialization. -

          5. - Remove name from names. -
          -
        - -
      3. - For each item name in names, execute the following - steps: +
      4. +

        For each namevalue of mimeType's + parameters: -

          -
        1. - Append the U+003B SEMICOLON character - (";") to serialization. +
            +
          1. Append U+003B (;) to serialization. -

          2. - Append name, ASCII lowercased, to - serialization. -
          +
        2. Append name to serialization. -

        3. - Should this special-case the "base64" boolean parameter last? +
        4. Append U+003D (=) to serialization. -

        5. - Return serialization. -
        +
      5. Append value to serialization. +

      +
    8. Return serialization. +

    MIME type groups

    From 46fb1a66c50f31699e14b1d23183d11beb8bc8b8 Mon Sep 17 00:00:00 2001 From: Anne van Kesteren Date: Tue, 10 Oct 2017 12:08:51 +0200 Subject: [PATCH 04/18] Sigh, handling backslashes is no fun (no wonder it only works in Firefox) --- mimesniff.bs | 69 ++++++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 62 insertions(+), 7 deletions(-) diff --git a/mimesniff.bs b/mimesniff.bs index 699241e..748d6ef 100644 --- a/mimesniff.bs +++ b/mimesniff.bs @@ -159,6 +159,25 @@ confusion with the use of media type as described in Media Queries<

    A MIME type's parameters is an ordered map. It is initially empty. +

    To compute a normalized MIME type parameter value, given a string +value, run these steps: + +

      +
    1. If value does not start with U+0022 ("), then return value. + +

    2. Remove the leading and trailing U+0022 (") from value. + +

    3. Replace any "\\" sequence with U+005C (\) within value. + +

    4. Replace any "\"" sequence with U+0022 (") within value. + +

    5. Return value. +

    + +

    If a MIME type parameter value is significant, then the +normalized MIME type parameter value algorithm has to be used. + +

    MIME type miscellaneous

    The essence of a MIME type mimeType is the @@ -261,15 +280,51 @@ that does not contain U+003B (;).

    1. If the current code point in input is U+0022 ("), then advance - position to the next code point in input and: + position to the next code point in input, set + parameterValue to U+0022 ("), and:

        -
      1. Set parameterValue to the result of collecting a sequence of code - points that are not U+0022 (") from input, given position. +

      2. +

        While true: + +

          +
        1. Append the result of collecting a sequence of code points that are not + U+0022 (") or U+005C (\) from input, given position, to + parameterValue. + +

        2. +

          If position is not past the end of input and the current + code point in input is U+005C (\), then advance position to + the next code point in input and: + +

            +
          1. +

            If position is not past the end of input, then: + +

              +
            1. If the current code point in input is U+0022 (") or + U+005C (\), then append U+005C (\) followed by the current code point in + input, to parameterValue. + +

            2. Otherwise, append the current code point in input to + parameterValue. + +

            3. Advance position to the next code point in input. + +

            4. Continue. +

            + +
          2. Otherwise, append U+005C (\) to parameterValue and + break. +

          + +
        3. Otherwise, break. +

        + +
      3. Append U+0022 (") to parameterValue. -

      4. If position is not past the end of input, then collect a - sequence of code points that are not U+003B (;) from input, given - position. +

      5. Collect a sequence of code points that are not U+003B (;) from input, + given position.
      @@ -291,7 +346,7 @@ that does not contain U+003B (;).
    -
  • Return mimeType +

  • Return mimeType. From 131819fb9b3c93f99ce9af50be6dbb577a3c6338 Mon Sep 17 00:00:00 2001 From: Anne van Kesteren Date: Fri, 24 Nov 2017 14:52:20 +0100 Subject: [PATCH 05/18] fix bs --- mimesniff.bs | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mimesniff.bs b/mimesniff.bs index 748d6ef..b13cfd2 100644 --- a/mimesniff.bs +++ b/mimesniff.bs @@ -362,7 +362,7 @@ optional boolean exclude parameters, run these steps: type, U+002F ("/"), and mimeType's subtype. -

  • If exclude parameters is true or mimeType's +

  • If exclude parameters is true or mimeType's parameters is empty, then return serialization.

  • From 780c59766c39e090d30cb7ff9291616470ff7765 Mon Sep 17 00:00:00 2001 From: Anne van Kesteren Date: Fri, 24 Nov 2017 15:09:03 +0100 Subject: [PATCH 06/18] don't strip before parameterValue (happens for charset anyway due to Encoding) --- mimesniff.bs | 3 --- 1 file changed, 3 deletions(-) diff --git a/mimesniff.bs b/mimesniff.bs index b13cfd2..80519e0 100644 --- a/mimesniff.bs +++ b/mimesniff.bs @@ -269,9 +269,6 @@ that does not contain U+003B (;). skips past U+003D (=).) -
  • Skip ASCII whitespace within input given position. - -

  • Let parameterValue be null.

  • From 6c449ee443d79dc107380c3a27940d6a2fe6f754 Mon Sep 17 00:00:00 2001 From: Anne van Kesteren Date: Fri, 24 Nov 2017 16:18:20 +0100 Subject: [PATCH 07/18] step 7 can be skipped due to EOF --- mimesniff.bs | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/mimesniff.bs b/mimesniff.bs index 80519e0..465638c 100644 --- a/mimesniff.bs +++ b/mimesniff.bs @@ -269,7 +269,7 @@ that does not contain U+003B (;). skips past U+003D (=).) -
  • Let parameterValue be null. +

  • Let parameterValue be the empty string.

  • If position is not past the end of input, then: @@ -336,8 +336,8 @@ that does not contain U+003B (;). -

  • If parameterName and parameterValue are not the empty string, and - mimeType's parameters[parameterName] +

  • If neither parameterName nor parameterValue are the empty string, + and mimeType's parameters[parameterName] does not exist, then set mimeType's parameters[parameterName] to parameterValue. From 217896d10b29457a3e57675ebd13821eecf16355 Mon Sep 17 00:00:00 2001 From: Anne van Kesteren Date: Mon, 27 Nov 2017 14:42:46 +0100 Subject: [PATCH 08/18] make less MIME types parse successfully and drop more parameters --- mimesniff.bs | 38 ++++++++++++++++++++++++++++++++------ 1 file changed, 32 insertions(+), 6 deletions(-) diff --git a/mimesniff.bs b/mimesniff.bs index 465638c..f25ad3b 100644 --- a/mimesniff.bs +++ b/mimesniff.bs @@ -40,6 +40,11 @@ Indent: 1 } +

    +url:https://tools.ietf.org/html/rfc7230#section-3.2.6;text:token;type:dfn;spec:http
    +url:https://tools.ietf.org/html/rfc7230#section-3.2.6;text:quoted-string;type:dfn;spec:http
    +
    +

    Introduction

    @@ -116,6 +121,12 @@ Indent: 1

    This specification depends on the Infra Standard. [[!INFRA]] +

    An HTTP token is a string that matches the token token +production. + +

    An HTTP quoted-string is a string that matches the +quoted-string token production. +

    A binary data byte is a byte in the range 0x00 to 0x08 (NUL to BS), the byte 0x0B (VT), a byte in the @@ -228,6 +239,8 @@ that does not contain U+003B (;).

  • Let type be the result of collecting a sequence of code points that are not U+002F (/) from input, given position. +

  • If type is the empty string or is not an HTTP token, then return failure. +

  • If position is past the end of input, then return failure.

  • Advance position to the next code point in input. (This skips @@ -238,7 +251,8 @@ that does not contain U+003B (;).

  • Remove any trailing ASCII whitespace from subtype. -

  • If subtype is the empty string, then return failure. +

  • If subtype is the empty string or is not an HTTP token, then return + failure.

  • Let mimeType be a new MIME type record whose type is type, in ASCII lowercase, and subtype is @@ -336,11 +350,23 @@ that does not contain U+003B (;). -

  • If neither parameterName nor parameterValue are the empty string, - and mimeType's parameters[parameterName] - does not exist, then set mimeType's - parameters[parameterName] to parameterValue. - +

  • +

    If all of the following are true + +

    + +

    then set mimeType's + parameters[parameterName] to parameterValue. +

  • Return mimeType. From a2bb1751bdabc35edecfb9411b51fcc3cdb3ff07 Mon Sep 17 00:00:00 2001 From: Anne van Kesteren Date: Tue, 28 Nov 2017 14:48:34 +0100 Subject: [PATCH 09/18] address #46 --- mimesniff.bs | 85 +++++++++++++++++++++++++--------------------------- 1 file changed, 40 insertions(+), 45 deletions(-) diff --git a/mimesniff.bs b/mimesniff.bs index f25ad3b..9413715 100644 --- a/mimesniff.bs +++ b/mimesniff.bs @@ -32,6 +32,9 @@ Indent: 1 "MIMETYPE": { "aliasOf": "RFC2046" }, + "MEDIAQUERIES": { + "aliasOf": "mediaqueries-4" +}, "SECCONTSNIFF": { "authors": ["Adam Barth", "Juan Caballero", "Dawn Song"], "href": "https://www.adambarth.com/papers/2009/barth-caballero-song.pdf", @@ -43,6 +46,7 @@ Indent: 1

     url:https://tools.ietf.org/html/rfc7230#section-3.2.6;text:token;type:dfn;spec:http
     url:https://tools.ietf.org/html/rfc7230#section-3.2.6;text:quoted-string;type:dfn;spec:http
    +url:https://tools.ietf.org/html/rfc7231#section-3.1.1.1;text:media-type;type:dfn;spec:http
     
    @@ -121,11 +125,18 @@ url:https://tools.ietf.org/html/rfc7230#section-3.2.6;text:quoted-string;type:df

    This specification depends on the Infra Standard. [[!INFRA]] -

    An HTTP token is a string that matches the token token -production. +

    An HTTP token is U+0021 (!), U+0023 (#), U+0024 ($), U+0025 (%), U+0026 (&), +U+0027 ('), U+002A (*), U+002B (+), U+002D (-), U+002E (.), U+005E (^), U+005F (_), U+0060 (`), +U+007C (|), U+007E (~), or an ASCII alphanumeric.

    -

    An HTTP quoted-string is a string that matches the -quoted-string token production. +

    This matches the token token production. + +

    An HTTP quoted-string token is U+0009 TAB, a code point in the range +U+0020 SPACE to U+007E (~), inclusive, or a code point in the range U+0080 through +U+00FF (ÿ), inclusive. + +

    This matches the effective value space of the quoted-string token +production. By definition it is a superset of an HTTP token.

    A binary data byte is a byte in the range 0x00 to @@ -167,26 +178,8 @@ confusion with the use of media type as described in Media Queries<

    A MIME type's subtype is a non-empty ASCII string. -

    A MIME type's parameters is an ordered map. It is -initially empty. - -

    To compute a normalized MIME type parameter value, given a string -value, run these steps: - -

      -
    1. If value does not start with U+0022 ("), then return value. - -

    2. Remove the leading and trailing U+0022 (") from value. - -

    3. Replace any "\\" sequence with U+005C (\) within value. - -

    4. Replace any "\"" sequence with U+0022 (") within value. - -

    5. Return value. -

    - -

    If a MIME type parameter value is significant, then the -normalized MIME type parameter value algorithm has to be used. +

    A MIME type's parameters is an ordered map whose +keys and values are ASCII strings. It is initially empty.

    MIME type miscellaneous

    @@ -204,10 +197,9 @@ capability to interpret a resource of that MIME type and present i

    MIME type writing

    -

    A valid MIME type string is a string that matches the media-type -production defined in -section 3.1.1.1 "Media Type" of RFC 7231. -In particular, a valid MIME type may include parameters. [[!RFC7231]] +

    A valid MIME type string is a string that matches the +media-type token production. In particular, a valid MIME type may include +parameters. [[!RFC7231]]

    A valid MIME type string is supposed to be used for conformance checkers only. @@ -223,7 +215,6 @@ In particular, a valid MIME type may include parameters. [[!RFC723 that does not contain U+003B (;). -

    Parsing a MIME type

    To parse a MIME type, given a string input, run these steps: @@ -239,7 +230,8 @@ that does not contain U+003B (;).

  • Let type be the result of collecting a sequence of code points that are not U+002F (/) from input, given position. -

  • If type is the empty string or is not an HTTP token, then return failure. +

  • If type is the empty string or does not solely contain HTTP tokens, then + return failure.

  • If position is past the end of input, then return failure. @@ -251,8 +243,8 @@ that does not contain U+003B (;).

  • Remove any trailing ASCII whitespace from subtype. -

  • If subtype is the empty string or is not an HTTP token, then return - failure. +

  • If subtype is the empty string or does not solely contain HTTP tokens, + then return failure.

  • Let mimeType be a new MIME type record whose type is type, in ASCII lowercase, and subtype is @@ -291,8 +283,7 @@ that does not contain U+003B (;).

    1. If the current code point in input is U+0022 ("), then advance - position to the next code point in input, set - parameterValue to U+0022 ("), and: + position to the next code point in input and:

      1. @@ -313,11 +304,7 @@ that does not contain U+003B (;).

        If position is not past the end of input, then:

          -
        1. If the current code point in input is U+0022 (") or - U+005C (\), then append U+005C (\) followed by the current code point in - input, to parameterValue. - -

        2. Otherwise, append the current code point in input to +

        3. Append the current code point in input to parameterValue.

        4. Advance position to the next code point in input. @@ -332,8 +319,6 @@ that does not contain U+003B (;).

        5. Otherwise, break.

        -
      2. Append U+0022 (") to parameterValue. -

      3. Collect a sequence of code points that are not U+003B (;) from input, given position. @@ -356,9 +341,9 @@ that does not contain U+003B (;).
        • neither parameterName nor parameterValue are the empty string -
        • parameterName is an HTTP token +
        • parameterName solely contains HTTP tokens -
        • parameterValue is an HTTP token or HTTP quoted-string +
        • parameterValue solely contains HTTP quoted-string tokens
        • mimeType's parameters[parameterName] does not exist @@ -382,8 +367,7 @@ optional boolean exclude parameters, run these steps:
        • If exclude parameters is not given, then set it to false.

        • Let serialization be the concatenation of mimeType's - type, U+002F ("/"), and mimeType's - subtype. + type, U+002F (/), and mimeType's subtype.

        • If exclude parameters is true or mimeType's parameters is empty, then return serialization. @@ -399,6 +383,17 @@ optional boolean exclude parameters, run these steps:

        • Append U+003D (=) to serialization. +

        • +

          If value does not solely contain HTTP tokens: + +

            +
          1. Precede each occurence of U+0022 (") or U+005C (\) in value with U+005A (\). + +

          2. Prepend U+0022 (") to value. + +

          3. Append U+0022 (") to value. +

          +
        • Append value to serialization.

      From 941c8a2813352df20b3a8ebc8f9bb07575e2bfe1 Mon Sep 17 00:00:00 2001 From: Anne van Kesteren Date: Tue, 28 Nov 2017 16:07:36 +0100 Subject: [PATCH 10/18] Make remainder of the text use MIME type / essence --- mimesniff.bs | 75 ++++++++++++++++++---------------------------------- 1 file changed, 25 insertions(+), 50 deletions(-) diff --git a/mimesniff.bs b/mimesniff.bs index 9413715..bbeafa2 100644 --- a/mimesniff.bs +++ b/mimesniff.bs @@ -403,30 +403,16 @@ optional boolean exclude parameters, run these steps:

      MIME type groups

      -

      - An image type is any parsable MIME type where - type is equal to "image". +

      An audio or video type is any MIME type whose type is +"audio" or "video", or whose essence is +"application/ogg". -

      - An audio or video type is any parsable MIME type - where type is equal to "audio" or - "video" or where the MIME type portion is - equal to one of the following: - -

        -
      • - application/ogg -
      - -

      - A font type is any parsable MIME type where - the - MIME type portion is equal to one of the following: +

      A font type is any MIME type whose + +essence is one of the following:

      • @@ -451,20 +437,17 @@ optional boolean exclude parameters, run these steps: application/vnd.ms-fontobject
      -

      - A ZIP-based type is any parsable MIME type where the - subtype ends in "+zip" or the MIME type - portion is equal to one of the following: +

      A ZIP-based type is any MIME type whose subtype ends in +"+zip" or whose essence is one of the following:

      • application/zip
      -

      - An archive type is any parsable MIME type where - the - MIME type portion is equal to one of the following: +

      An archive type is any MIME type whose + +essence is one of the following:

      • @@ -477,24 +460,17 @@ optional boolean exclude parameters, run these steps: application/x-gzip
      -

      - An XML MIME type is any parsable MIME type where either the subtype - ends in "+xml", or the MIME type portion is equal to "text/xml" or - "application/xml". [[!RFC7303]] +

      An XML MIME type is any MIME type whose subtype +ends in "+xml" or whose essence is "text/xml" or +"application/xml". [[!RFC7303]] -

      - An HTML MIME type is any parsable MIME type where the - MIME type portion is equal to "text/html". +

      An HTML MIME type is any MIME type whose essence +"text/html". -

      - A scriptable MIME type is an XML MIME type or any - parsable MIME type where the MIME type portion is - equal to one of the following: +

      A scriptable MIME type is an XML MIME type, HTML MIME type or any +MIME type whose essence is one of the following:

        -
      • - text/html -
      • application/pdf
      @@ -532,7 +508,7 @@ optional boolean exclude parameters, run these steps: MIME type sniffing algorithm.
    2. - A computed MIME type, the parsable MIME type + A computed MIME type, the MIME type determined by the MIME type sniffing algorithm. @@ -656,7 +632,7 @@ optional boolean exclude parameters, run these steps: [[!FTP]]
    3. - If supplied-type is not a parsable MIME type, the + If supplied-type is not a MIME type, the supplied MIME type is undefined. Abort these steps. @@ -1825,9 +1801,8 @@ algorithm: Abort these steps.
    4. - If the MIME type portion of the supplied MIME - type is equal to "text/html", execute the - rules for distinguishing if a resource is a feed or HTML and + If the supplied MIME type's essence is "text/html", + execute the rules for distinguishing if a resource is a feed or HTML and abort these steps.
    5. @@ -2365,7 +2340,7 @@ type: -

      User agents may implicitly extend this table to support additional parsable MIME types. +

      User agents may implicitly extend this table to support additional MIME types.

      However, user agents should not implicitly extend this table to include additional byte patterns for any computed MIME type already present in this table, as doing so From 52ef89096800cbb4d27029f046915255af30b957 Mon Sep 17 00:00:00 2001 From: Anne van Kesteren Date: Tue, 28 Nov 2017 16:24:41 +0100 Subject: [PATCH 11/18] add byte variants --- mimesniff.bs | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/mimesniff.bs b/mimesniff.bs index bbeafa2..a3b7635 100644 --- a/mimesniff.bs +++ b/mimesniff.bs @@ -357,6 +357,17 @@ that does not contain U+003B (;).

    6. Return mimeType.

    +
    + +

    To parse a MIME type from bytes, given a byte sequence input, +run these steps: + +

      +
    1. Let string be input, isomorphic decoded. + +

    2. Return the result of parse a MIME type with string. +

    +

    Serializing a MIME type

    @@ -400,6 +411,20 @@ optional boolean exclude parameters, run these steps:
  • Return serialization. +


    + +

    To serialize a MIME type to bytes, given a MIME type mimeType +and an optional boolean exclude parameters, run these steps: + +

      +
    1. If exclude parameters is not given, then set it to false. + +

    2. Let stringSerialization be the result of serialize a MIME type given + mimeType and exclude parameters. + +

    3. Return stringSerialization, isomorphic encoded. +

    +

    MIME type groups

    From 2e4bfca8c7cf3b255950406ac38d82b4e62e7d39 Mon Sep 17 00:00:00 2001 From: Anne van Kesteren Date: Tue, 28 Nov 2017 16:29:19 +0100 Subject: [PATCH 12/18] let's not introduce a new magic value --- mimesniff.bs | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mimesniff.bs b/mimesniff.bs index a3b7635..d97a99d 100644 --- a/mimesniff.bs +++ b/mimesniff.bs @@ -222,7 +222,7 @@ that does not contain U+003B (;).
    1. Remove any leading and trailing ASCII whitespace from input. -

    2. If input is the empty string, then return missing. +

    3. If input is the empty string, then null.

    4. Let position be a position variable for input, initially pointing at the start of input. From da8f0004fd890ed0eda85ff19ee00cbcc32b8faf Mon Sep 17 00:00:00 2001 From: Anne van Kesteren Date: Wed, 29 Nov 2017 09:59:58 +0100 Subject: [PATCH 13/18] address nits --- mimesniff.bs | 32 ++++++++++++++++---------------- 1 file changed, 16 insertions(+), 16 deletions(-) diff --git a/mimesniff.bs b/mimesniff.bs index d97a99d..2f34b2d 100644 --- a/mimesniff.bs +++ b/mimesniff.bs @@ -125,18 +125,18 @@ url:https://tools.ietf.org/html/rfc7231#section-3.1.1.1;text:media-type;type:dfn

      This specification depends on the Infra Standard. [[!INFRA]] -

      An HTTP token is U+0021 (!), U+0023 (#), U+0024 ($), U+0025 (%), U+0026 (&), -U+0027 ('), U+002A (*), U+002B (+), U+002D (-), U+002E (.), U+005E (^), U+005F (_), U+0060 (`), -U+007C (|), U+007E (~), or an ASCII alphanumeric.

      +

      An HTTP token code point is U+0021 (!), U+0023 (#), U+0024 ($), U+0025 (%), +U+0026 (&), U+0027 ('), U+002A (*), U+002B (+), U+002D (-), U+002E (.), U+005E (^), U+005F (_), +U+0060 (`), U+007C (|), U+007E (~), or an ASCII alphanumeric.

      -

      This matches the token token production. +

      This matches the value space of the token token production. [[HTTP]] -

      An HTTP quoted-string token is U+0009 TAB, a code point in the range +

      An HTTP quoted-string token code point is U+0009 TAB, a code point in the range U+0020 SPACE to U+007E (~), inclusive, or a code point in the range U+0080 through U+00FF (ÿ), inclusive.

      This matches the effective value space of the quoted-string token -production. By definition it is a superset of an HTTP token. +production. By definition it is a superset of the HTTP token code points. [[HTTP]]

      A binary data byte is a byte in the range 0x00 to @@ -222,16 +222,16 @@ that does not contain U+003B (;).

      1. Remove any leading and trailing ASCII whitespace from input. -

      2. If input is the empty string, then null. +

      3. If input is the empty string, then return null. -

      4. Let position be a position variable for input, initially pointing at - the start of input. +

      5. Let position be a position variable for input, + initially pointing at the start of input.

      6. Let type be the result of collecting a sequence of code points that are not U+002F (/) from input, given position. -

      7. If type is the empty string or does not solely contain HTTP tokens, then - return failure. +

      8. If type is the empty string or does not solely contain + HTTP token code points, then return failure.

      9. If position is past the end of input, then return failure. @@ -243,8 +243,8 @@ that does not contain U+003B (;).

      10. Remove any trailing ASCII whitespace from subtype. -

      11. If subtype is the empty string or does not solely contain HTTP tokens, - then return failure. +

      12. If subtype is the empty string or does not solely contain + HTTP token code points, then return failure.

      13. Let mimeType be a new MIME type record whose type is type, in ASCII lowercase, and subtype is @@ -341,9 +341,9 @@ that does not contain U+003B (;).

        • neither parameterName nor parameterValue are the empty string -
        • parameterName solely contains HTTP tokens +
        • parameterName solely contains HTTP token code points -
        • parameterValue solely contains HTTP quoted-string tokens +
        • parameterValue solely contains HTTP quoted-string token code points
        • mimeType's parameters[parameterName] does not exist @@ -395,7 +395,7 @@ optional boolean exclude parameters, run these steps:
        • Append U+003D (=) to serialization.

        • -

          If value does not solely contain HTTP tokens: +

          If value does not solely contain HTTP token code points:

          1. Precede each occurence of U+0022 (") or U+005C (\) in value with U+005A (\). From 115bba5b677c0122732db9e4e8952afb71fb8a60 Mon Sep 17 00:00:00 2001 From: Anne van Kesteren Date: Wed, 29 Nov 2017 10:37:19 +0100 Subject: [PATCH 14/18] empty string -> failure is fine after all I thought we needed to distinguish for overrideMimeType but not so --- mimesniff.bs | 2 -- 1 file changed, 2 deletions(-) diff --git a/mimesniff.bs b/mimesniff.bs index 2f34b2d..c46c4e8 100644 --- a/mimesniff.bs +++ b/mimesniff.bs @@ -222,8 +222,6 @@ that does not contain U+003B (;).

            1. Remove any leading and trailing ASCII whitespace from input. -

            2. If input is the empty string, then return null. -

            3. Let position be a position variable for input, initially pointing at the start of input. From 6c429445c22f6c1cf352f89e6d20e686b603a9c1 Mon Sep 17 00:00:00 2001 From: Anne van Kesteren Date: Wed, 29 Nov 2017 11:07:28 +0100 Subject: [PATCH 15/18] one quarter MIME type portion --- mimesniff.bs | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/mimesniff.bs b/mimesniff.bs index c46c4e8..b27e9fd 100644 --- a/mimesniff.bs +++ b/mimesniff.bs @@ -1798,8 +1798,8 @@ algorithm:

              1. - If the supplied MIME type is undefined or if the MIME - type portion of the supplied MIME type is equal to + If the supplied MIME type is undefined or if the + supplied MIME type's essence is "unknown/unknown", "application/unknown", or "*/*", execute the rules for identifying an unknown MIME type with From 33eedbc619d453bc08efdf6b2a4cb7a6e39df8ae Mon Sep 17 00:00:00 2001 From: Anne van Kesteren Date: Mon, 4 Dec 2017 10:05:17 +0100 Subject: [PATCH 16/18] Address review feedback --- mimesniff.bs | 59 ++++++++++++++++++++++++++-------------------------- 1 file changed, 30 insertions(+), 29 deletions(-) diff --git a/mimesniff.bs b/mimesniff.bs index b27e9fd..0619817 100644 --- a/mimesniff.bs +++ b/mimesniff.bs @@ -184,9 +184,9 @@ confusion with the use of media type as described in Media Queries<

                MIME type miscellaneous

                -

                The essence of a MIME type mimeType is the -result of serializing mimeType with -exclude parameters set to true. +

                The essence of a MIME type mimeType is +mimeType's type, followed by U+002F (/), followed by +mimeType's subtype.

                A MIME type is supported by the user agent if the user agent has the capability to interpret a resource of that MIME type and present it to the user. @@ -266,8 +266,8 @@ that does not contain U+003B (;).

                If position is not past the end of input, then:

                  -
                1. If the current code point in input is U+003B (;), then - continue. +

                2. If the code point at position within input is U+003B (;), + then continue.

                3. Advance position to the next code point in input. (This skips past U+003D (=).) @@ -280,10 +280,12 @@ that does not contain U+003B (;).

                  1. -

                    If the current code point in input is U+0022 ("), then advance - position to the next code point in input and: +

                    If the code point at position within input is U+0022 ("), + then:

                      +
                    1. Advance position to the next code point in input. +

                    2. While true: @@ -293,16 +295,17 @@ that does not contain U+003B (;). parameterValue.

                    3. -

                      If position is not past the end of input and the current - code point in input is U+005C (\), then advance position to - the next code point in input and: +

                      If position is not past the end of input and the + code point at position within input is U+005C (\), then:

                        +
                      1. Advance position to the next code point in input. +

                      2. If position is not past the end of input, then:

                          -
                        1. Append the current code point in input to +

                        2. Append the code point at position within input to parameterValue.

                        3. Advance position to the next code point in input. @@ -317,9 +320,13 @@ that does not contain U+003B (;).

                        4. Otherwise, break.

                        -
                      3. Collect a sequence of code points that are not U+003B (;) from input, - given position. - +
                      4. +

                        Collect a sequence of code points that are not U+003B (;) from input, + given position. + +

                        Given + text/html;charset="shift_jis"iso-2022-jp you end up with + text/html;charset=shift_jis.

                    4. @@ -337,7 +344,9 @@ that does not contain U+003B (;).

                      If all of the following are true

                        -
                      • neither parameterName nor parameterValue are the empty string +
                      • parameterName is not the empty string + +
                      • parameterValue is not the empty string
                      • parameterName solely contains HTTP token code points @@ -349,7 +358,6 @@ that does not contain U+003B (;).

                        then set mimeType's parameters[parameterName] to parameterValue. -

                  2. Return mimeType. @@ -369,18 +377,13 @@ run these steps:

                    Serializing a MIME type

                    -

                    To serialize a MIME type, given a MIME type mimeType and an -optional boolean exclude parameters, run these steps: +

                    To serialize a MIME type, given a MIME type mimeType, run +these steps:

                      -
                    1. If exclude parameters is not given, then set it to false. -

                    2. Let serialization be the concatenation of mimeType's type, U+002F (/), and mimeType's subtype. -

                    3. If exclude parameters is true or mimeType's - parameters is empty, then return serialization. -

                    4. For each namevalue of mimeType's parameters: @@ -411,14 +414,12 @@ optional boolean exclude parameters, run these steps:


                      -

                      To serialize a MIME type to bytes, given a MIME type mimeType -and an optional boolean exclude parameters, run these steps: +

                      To serialize a MIME type to bytes, given a MIME type +mimeType, run these steps:

                        -
                      1. If exclude parameters is not given, then set it to false. - -

                      2. Let stringSerialization be the result of serialize a MIME type given - mimeType and exclude parameters. +

                      3. Let stringSerialization be the result of serialize a MIME type with + mimeType.

                      4. Return stringSerialization, isomorphic encoded.

                      From 534a51d6fb2676ff6629a420d0b8a6a7e6fc0aad Mon Sep 17 00:00:00 2001 From: Anne van Kesteren Date: Mon, 4 Dec 2017 10:50:48 +0100 Subject: [PATCH 17/18] preserve IDs HTML relies on and some that make sense not to break --- mimesniff.bs | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-) diff --git a/mimesniff.bs b/mimesniff.bs index 0619817..dfb69ab 100644 --- a/mimesniff.bs +++ b/mimesniff.bs @@ -165,7 +165,7 @@ production. By definition it is a superset of the HTTP token code points.

                      MIME type representation

                      -

                      A MIME type represents an +

                      A MIME type represents an internet media type as defined by Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types. It can also be referred to as a MIME type record. [[!MIMETYPE]] @@ -174,12 +174,13 @@ referred to as a MIME type record. [[!MIMETYPE]] confusion with the use of media type as described in Media Queries. [[MEDIAQUERIES]] -

                      A MIME type's type is a non-empty ASCII string. +

                      A MIME type's type is a non-empty ASCII string. -

                      A MIME type's subtype is a non-empty ASCII string. +

                      A MIME type's subtype is a non-empty +ASCII string. -

                      A MIME type's parameters is an ordered map whose -keys and values are ASCII strings. It is initially empty. +

                      A MIME type's parameters is an ordered map +whose keys and values are ASCII strings. It is initially empty.

                      MIME type miscellaneous

                      @@ -197,7 +198,7 @@ capability to interpret a resource of that MIME type and present i

                      MIME type writing

                      -

                      A valid MIME type string is a string that matches the +

                      A valid MIME type string is a string that matches the media-type token production. In particular, a valid MIME type may include parameters. [[!RFC7231]] @@ -211,8 +212,9 @@ capability to interpret a resource of that MIME type and present i been "text/html". -

                      A valid MIME type string with no parameters is a valid MIME type string -that does not contain U+003B (;). +

                      A +valid MIME type string with no parameters is +a valid MIME type string that does not contain U+003B (;).

                      Parsing a MIME type

                      From 3f7058010cbc67a005d293a70b6f95054674d53e Mon Sep 17 00:00:00 2001 From: Anne van Kesteren Date: Mon, 4 Dec 2017 13:07:52 +0100 Subject: [PATCH 18/18] remove upstreamed ref --- mimesniff.bs | 3 --- 1 file changed, 3 deletions(-) diff --git a/mimesniff.bs b/mimesniff.bs index dfb69ab..3227a94 100644 --- a/mimesniff.bs +++ b/mimesniff.bs @@ -32,9 +32,6 @@ Indent: 1 "MIMETYPE": { "aliasOf": "RFC2046" }, - "MEDIAQUERIES": { - "aliasOf": "mediaqueries-4" -}, "SECCONTSNIFF": { "authors": ["Adam Barth", "Juan Caballero", "Dawn Song"], "href": "https://www.adambarth.com/papers/2009/barth-caballero-song.pdf",