diff --git a/mimesniff.bs b/mimesniff.bs index 8dce0a1..3227a94 100644 --- a/mimesniff.bs +++ b/mimesniff.bs @@ -40,6 +40,12 @@ Indent: 1 } +
+url:https://tools.ietf.org/html/rfc7230#section-3.2.6;text:token;type:dfn;spec:http
+url:https://tools.ietf.org/html/rfc7230#section-3.2.6;text:quoted-string;type:dfn;spec:http
+url:https://tools.ietf.org/html/rfc7231#section-3.1.1.1;text:media-type;type:dfn;spec:http
+
+

Introduction

@@ -116,6 +122,19 @@ Indent: 1

This specification depends on the Infra Standard. [[!INFRA]] +

An HTTP token code point is U+0021 (!), U+0023 (#), U+0024 ($), U+0025 (%), +U+0026 (&), U+0027 ('), U+002A (*), U+002B (+), U+002D (-), U+002E (.), U+005E (^), U+005F (_), +U+0060 (`), U+007C (|), U+007E (~), or an ASCII alphanumeric.

+ +

This matches the value space of the token token production. [[HTTP]] + +

An HTTP quoted-string token code point is U+0009 TAB, a code point in the range +U+0020 SPACE to U+007E (~), inclusive, or a code point in the range U+0080 through +U+00FF (ÿ), inclusive. + +

This matches the effective value space of the quoted-string token +production. By definition it is a superset of the HTTP token code points. [[HTTP]] +

A binary data byte is a byte in the range 0x00 to 0x08 (NUL to BS), the byte 0x0B (VT), a byte in the @@ -139,553 +158,284 @@ Indent: 1 represented by ~. -

Understanding MIME types

+

MIME types

-

- The MIME type of a resource is a technical hint about the use and format - of that resource. [[!MIMETYPE]] +

MIME type representation

-

- A MIME type is sometimes called an Internet media type in protocol literature, but - consistently using the term MIME type avoids confusion with the use of "media type" as - described in the Media Queries CSS specification. [[MEDIAQUERIES-4]] +

A MIME type represents an +internet media type as defined by +Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types. It can also be +referred to as a MIME type record. [[!MIMETYPE]] -

- A parsable MIME type is a MIME type for which the - parse a MIME type algorithm does not return undefined. +

Standards are encouraged to consistently use the term MIME type to avoid +confusion with the use of media type as described in Media Queries. +[[MEDIAQUERIES]] - Every parsable MIME type has a corresponding parsed MIME - type, which is the result of - parsing the parsable MIME - type. +

A MIME type's type is a non-empty ASCII string. - A parsed MIME type is made up of a type, a - subtype, and a dictionary of parameters. +

A MIME type's subtype is a non-empty +ASCII string. -

- A valid MIME type is a string that matches the - media-type rule defined in - section 3.1.1.1 "Media Type" of RFC - 7231. In particular, a valid MIME type may include parameters. [[!RFC7231]] +

A MIME type's parameters is an ordered map +whose keys and values are ASCII strings. It is initially empty. -

- TODO: give an example of a string that is a parsable MIME type but not a - valid MIME type. -

- A valid MIME type with no parameters is a MIME type that does not contain - any U+003B SEMICOLON (;) characters. In other words, it consists only of a type and - subtype, with no parameters. +

MIME type miscellaneous

-

- A serialized MIME type is the result of - serializing a parsed MIME - type. - -

- The MIME type portion of a parsable MIME type is the - result of serializing the - type and subtype of its parsed MIME - type with null parameters. +

The essence of a MIME type mimeType is +mimeType's type, followed by U+002F (/), followed by +mimeType's subtype. -

- The MIME type portion of a parsable MIME type - excludes any and all parameters. +

A MIME type is supported by the user agent if the user agent has the +capability to interpret a resource of that MIME type and present it to the user. -

- A parsable MIME type is supported by the user agent - if the user agent has the capability to interpret a resource of - that MIME type and present it to the user. +

This needs more work. See +w3c/preload #113. +

MIME type writing

-

Parsing a MIME type

+

A valid MIME type string is a string that matches the +media-type token production. In particular, a valid MIME type may include +parameters. [[!RFC7231]] -

- To parse a MIME type, the user agent must execute the following - steps: +

A valid MIME type string is supposed to be used for conformance checkers only. -

    -
  1. - Let sequence be the byte sequence of the MIME - type, where sequence[s] is byte - s in sequence and sequence[0] is the first - byte in sequence. +
    +

    "text/html" is a valid MIME type string. -

  2. - If the number of bytes in sequence is - less than 1, return undefined. +

    "text/html;" is not a valid MIME type string, though + parse a MIME type returns a MIME type record for it identical to if the input had + been "text/html". + -

  3. - Initialize s to 0. +

    A +valid MIME type string with no parameters is +a valid MIME type string that does not contain U+003B (;). -

  4. - Initialize type and subtype to the empty string - (""). -
  5. - Initialize parameters to the empty dictionary ({}). +

    Parsing a MIME type

    -
  6. - While sequence[s] is ASCII whitespace, - continuously execute the following steps: +

    To parse a MIME type, given a string input, run these steps: -

      -
    1. - Increment s by 1. +
        +
      1. Remove any leading and trailing ASCII whitespace from input. -

      2. - If sequence[s] is undefined, return undefined. -
      +
    2. Let position be a position variable for input, + initially pointing at the start of input. -

    3. - Initialize t to 0. +
    4. Let type be the result of collecting a sequence of code points that are + not U+002F (/) from input, given position. -

    5. - While sequence[s] is not equal to the U+002F SOLIDUS - character ("/"), continuously execute the - following steps: +
    6. If type is the empty string or does not solely contain + HTTP token code points, then return failure. -

        -
      1. - If t is greater than 127, return undefined. +
      2. If position is past the end of input, then return failure. -

      3. - If sequence[s] is undefined, return undefined. +
      4. Advance position to the next code point in input. (This skips + past U+002F (/).) -

      5. - Append sequence[s], ASCII lowercased, to type. +
      6. Let subtype be the result of collecting a sequence of code points that are + not U+003B (;) from input, given position. -

      7. - Increment s and t by 1. -
      +
    7. Remove any trailing ASCII whitespace from subtype. -

    8. - Increment s by 1. +
    9. If subtype is the empty string or does not solely contain + HTTP token code points, then return failure. -

    10. - Initialize u to 0. +
    11. Let mimeType be a new MIME type record whose type + is type, in ASCII lowercase, and subtype is + subtype, in ASCII lowercase. -

    12. - While sequence[s] is not ASCII whitespace - and is not equal to the U+003B SEMICOLON character - (";"), continuously execute the following steps: +
    13. +

      While position is not past the end of input: -

        -
      1. - If u is greater than 127, return undefined. +
          +
        1. Advance position to the next code point in input. (This skips + past U+003B (;).) -

        2. - If sequence[s] is undefined, return - type, subtype, and parameters. +
        3. Skip ASCII whitespace within input given position. -

        4. - Append sequence[s], ASCII lowercased, to subtype. +
        5. Let parameterName be the result of collecting a sequence of code points + that are not U+003B (;) or U+003D (=) from input, given position. -

        6. - Increment s and u by 1. -
        +
      2. Set parameterName to parameterName, in ASCII lowercase.

      3. - Enter loop L: +

        If position is not past the end of input, then:

          -
        1. - Enter loop M: +
        2. If the code point at position within input is U+003B (;), + then continue. -

            -
          1. - If sequence[s] is undefined or is equal to the - U+003B SEMICOLON character (";"), exit loop - M. - -
          2. - While sequence[s] is ASCII whitespace, continuously execute the - following steps: - -
              -
            1. - Increment s by 1. -
            - -
          3. - If sequence[s] is equal to the U+0022 QUOTATION - MARK character ("""), execute the following - steps: - -
              -
            1. - Increment s by 1. - -
            2. - Enter loop N: - -
                -
              1. - If sequence[s] is undefined or is equal to - the U+0022 QUOTATION MARK character - ("""), execute the following steps: - -
                  -
                1. - If sequence[s] is equal to the U+0022 - QUOTATION MARK character ("""), - increment s by 1. - -
                2. - Exit loop N. -
                - -
              2. - If sequence[s] is equal to the U+005C - REVERSE SOLIDUS character ("\") and - sequence[s + 1] is not undefined, increment - s by 1. - -
              3. - Increment s by 1. -
              -
            - - Otherwise, enter loop N: - -
              -
            1. - If sequence[s] is undefined or is - ASCII whitespace or is equal to the U+003B - SEMICOLON character (";"), exit loop - N. - -
            2. - Increment s by 1. -
            -
          - -
        3. - If sequence[s] is undefined, return - type, subtype, and parameters. - -
        4. - Increment s by 1. - -
        5. - While sequence[s] is ASCII whitespace, continuously execute the - following steps: - -
            -
          1. - Increment s by 1. -
          +
        6. Advance position to the next code point in input. (This + skips past U+003D (=).) +

        -
      4. - Initialize name and extra to the empty string - (""). +
      5. Let parameterValue be the empty string. -

      6. - Initialize p to 0. +
      7. +

        If position is not past the end of input, then: +

        1. - Enter loop M: +

          If the code point at position within input is U+0022 ("), + then:

            -
          1. - Append extra to name. +
          2. Advance position to the next code point in input.

          3. - While sequence[s] is not ASCII whitespace and is not equal to - the U+003D EQUALS SIGN character ("="), continuously execute the following - steps: +

            While true:

              -
            1. - If p is greater than 127, return undefined. +
            2. Append the result of collecting a sequence of code points that are not + U+0022 (") or U+005C (\) from input, given position, to + parameterValue.

            3. - If sequence[s] is undefined, execute the - following steps: +

              If position is not past the end of input and the + code point at position within input is U+005C (\), then:

                -
              1. - If name is not equal to the empty string - ("") and parameters[name] - is undefined, set parameters[name] to null. +
              2. Advance position to the next code point in input.

              3. - Return type, subtype, and - parameters. -
              +

              If position is not past the end of input, then: -

            4. - Append sequence[s], ASCII lowercased, to name. +
                +
              1. Append the code point at position within input to + parameterValue. -

              2. - Increment s and p by 1. -
              +
            5. Advance position to the next code point in input. -

            6. - While sequence[s] is ASCII whitespace, continuously execute the - following steps: +
            7. Continue. +

            -
              -
            1. - Append sequence[s] to extra. +
            2. Otherwise, append U+005C (\) to parameterValue and + break. +

            -
          4. - Increment s and p by 1. +
          5. Otherwise, break.

        2. - If sequence[s] is equal to the U+003D EQUALS - SIGN character ("="), exit loop - M. -
        - -
      8. - Increment s by 1. +

        Collect a sequence of code points that are not U+003B (;) from input, + given position. -

      9. - Initialize parameters[name] to null. - -
      10. - While sequence[s] is ASCII whitespace, continuously execute the - following steps: - -
          -
        1. - Increment s by 1. +

          Given + text/html;charset="shift_jis"iso-2022-jp you end up with + text/html;charset=shift_jis.

      11. - Initialize value to the empty string (""). - -
      12. - If sequence[s] is undefined, execute the following - steps: +

        Otherwise:

          -
        1. - Set parameters[name] to value. +
        2. Set parameterValue to the result of collecting a sequence of code + points that are not U+003B (;) from input, given position. -

        3. - Return type, subtype, and parameters. +
        4. Remove any trailing ASCII whitespace from parameterValue.

        +
      -
    14. - If sequence[s] is equal to the U+0022 QUOTATION - MARK character ("""), execute the following - steps: - -
        -
      1. - Increment s by 1. - -
      2. - Enter loop M: - -
          -
        1. - If sequence[s] is undefined or is equal to the - U+0022 QUOTATION MARK character ("""), - execute the following steps: - -
            -
          1. - Set parameters[name] to value. +
          2. +

            If all of the following are true -

          3. - If sequence[s] is equal to the U+0022 - QUOTATION MARK character ("""), - increment s by 1. +
              +
            • parameterName is not the empty string -
            • - Exit loop M. -
          - -
        2. - If sequence[s] is equal to the U+005C REVERSE - SOLIDUS character ("\") and - sequence[s + 1] is not undefined, increment - s by 1. +
        3. parameterValue is not the empty string -
        4. - Append sequence[s] to value. +
        5. parameterName solely contains HTTP token code points -
        6. - Increment s by 1. -
        -
      +
    15. parameterValue solely contains HTTP quoted-string token code points - Otherwise, enter loop M: +
    16. mimeType's parameters[parameterName] + does not exist + -
        -
      1. - If sequence[s] is undefined or is - ASCII whitespace or is equal to the U+003B SEMICOLON - character (";"), execute the following steps: - -
          -
        1. - Set parameters[name] to value. +

          then set mimeType's + parameters[parameterName] to parameterValue. +

        -
      2. - Exit loop M. -
      +
    17. Return mimeType. +

    -
  7. - Append sequence[s] to value. +
    -
  8. - Increment s by 1. -
- - +

To parse a MIME type from bytes, given a byte sequence input, +run these steps: -

- The parse a MIME type algorithm is intended to be executed after - any protocol-specific syntax within the MIME type has been - handled. +

    +
  1. Let string be input, isomorphic decoded. +

  2. Return the result of parse a MIME type with string. +

Serializing a MIME type

-

- To serialize a MIME type, given a type, a - subtype, and a dictionary of parameters, execute the - following steps: - -

    -
  1. - If type is undefined, is null, is equal to the empty string - (""), or has a length greater than - 127, return undefined. - -
  2. - If subtype is undefined, is null, or has a - length greater than 127, return undefined. - -
  3. - Let serialization be the concatenation of type, the - U+002F SOLIDUS character ("/"), and - subtype. - -
  4. - If parameters is undefined or is null, return - serialization. - -
  5. - Let names be a list of the keys in parameters, - sorted ASCII - case-insensitively in ascending alphabetical order. - -
  6. - Should this special-case the "charset" or - "codecs" parameters first? - -
  7. - For each item name in names, execute the following - steps: - -
      -
    1. - If name has a length greater than 127, return undefined. +

      To serialize a MIME type, given a MIME type mimeType, run +these steps: -

    2. - If parameters[name] is not null, execute the - following steps: - -
        -
      1. - Append the U+003B SEMICOLON character - (";") to serialization. +
          +
        1. Let serialization be the concatenation of mimeType's + type, U+002F (/), and mimeType's subtype. -

        2. - Append name, ASCII lowercased, to - serialization. +
        3. +

          For each namevalue of mimeType's + parameters: -

        4. - Append the U+003D EQUALS SIGN character - ("=") to serialization. +
            +
          1. Append U+003B (;) to serialization. -

          2. - Append the U+0022 QUOTATION MARK character - (""") to serialization. +
          3. Append name to serialization. -

          4. - For each character char in - parameters[name], execute the following steps: +
          5. Append U+003D (=) to serialization. -

              -
            1. - If char is equal to the U+0022 QUOTATION MARK - character (""") or to the - U+005C REVERSE SOLIDUS character - ("\"), append the U+005C REVERSE SOLIDUS - character ("\") to - serialization. +
            2. +

              If value does not solely contain HTTP token code points: -

            3. - Append char to serialization. -
            +
              +
            1. Precede each occurence of U+0022 (") or U+005C (\) in value with U+005A (\). -

            2. - Append the U+0022 QUOTATION MARK character - (""") to serialization. +
            3. Prepend U+0022 (") to value. -

            4. - Remove name from names. -
            -
          +
        5. Append U+0022 (") to value. +

        -
      2. - For each item name in names, execute the following - steps: +
      3. Append value to serialization. +

      -
        -
      1. - Append the U+003B SEMICOLON character - (";") to serialization. +
      2. Return serialization. +

      -
    3. - Append name, ASCII lowercased, to - serialization. -
    +
    -
  8. - Should this special-case the "base64" boolean parameter last? +

    To serialize a MIME type to bytes, given a MIME type +mimeType, run these steps: -

  9. - Return serialization. -
+
    +
  1. Let stringSerialization be the result of serialize a MIME type with + mimeType. +

  2. Return stringSerialization, isomorphic encoded. +

MIME type groups

-

- An image type is any parsable MIME type where - type is equal to "image". +

An audio or video type is any MIME type whose type is +"audio" or "video", or whose essence is +"application/ogg". -

- An audio or video type is any parsable MIME type - where type is equal to "audio" or - "video" or where the MIME type portion is - equal to one of the following: - -

- -

- A font type is any parsable MIME type where - the - MIME type portion is equal to one of the following: +

A font type is any MIME type whose + +essence is one of the following:

-

- A ZIP-based type is any parsable MIME type where the - subtype ends in "+zip" or the MIME type - portion is equal to one of the following: +

A ZIP-based type is any MIME type whose subtype ends in +"+zip" or whose essence is one of the following:

-

- An archive type is any parsable MIME type where - the - MIME type portion is equal to one of the following: +

An archive type is any MIME type whose + +essence is one of the following:

-

- An XML MIME type is any parsable MIME type where either the subtype - ends in "+xml", or the MIME type portion is equal to "text/xml" or - "application/xml". [[!RFC7303]] +

An XML MIME type is any MIME type whose subtype +ends in "+xml" or whose essence is "text/xml" or +"application/xml". [[!RFC7303]] -

- An HTML MIME type is any parsable MIME type where the - MIME type portion is equal to "text/html". +

An HTML MIME type is any MIME type whose essence +"text/html". -

- A scriptable MIME type is an XML MIME type or any - parsable MIME type where the MIME type portion is - equal to one of the following: +

A scriptable MIME type is an XML MIME type, HTML MIME type or any +MIME type whose essence is one of the following:

@@ -791,7 +531,7 @@ Indent: 1 MIME type sniffing algorithm.
  • - A computed MIME type, the parsable MIME type + A computed MIME type, the MIME type determined by the MIME type sniffing algorithm. @@ -915,7 +655,7 @@ Indent: 1 [[!FTP]]
  • - If supplied-type is not a parsable MIME type, the + If supplied-type is not a MIME type, the supplied MIME type is undefined. Abort these steps. @@ -2058,8 +1798,8 @@ algorithm:
    1. - If the supplied MIME type is undefined or if the MIME - type portion of the supplied MIME type is equal to + If the supplied MIME type is undefined or if the + supplied MIME type's essence is "unknown/unknown", "application/unknown", or "*/*", execute the rules for identifying an unknown MIME type with @@ -2084,9 +1824,8 @@ algorithm: Abort these steps.
    2. - If the MIME type portion of the supplied MIME - type is equal to "text/html", execute the - rules for distinguishing if a resource is a feed or HTML and + If the supplied MIME type's essence is "text/html", + execute the rules for distinguishing if a resource is a feed or HTML and abort these steps.
    3. @@ -2624,7 +2363,7 @@ type: -

      User agents may implicitly extend this table to support additional parsable MIME types. +

      User agents may implicitly extend this table to support additional MIME types.

      However, user agents should not implicitly extend this table to include additional byte patterns for any computed MIME type already present in this table, as doing so