Review Inferred vs Assumed charsets #5757

gregw · 2020-12-06T10:40:54Z

Jetty version

= 9.4

Description

There is something a bit strange, at least in our language about inferred vs assumed character encodings.

In MimeType we say:

    /**
     * Access a mutable map of mime type to the charset inferred from that content type.
     * An inferred encoding is used by when encoding/decoding a stream and is
     * explicitly set in any metadata (eg Content-Type).
     *
     * @return Map of mime type to charset
     */
    public static Map<String, String> getInferredEncodings()
    {
        return __inferredEncodings;
    }

    /**
     * Access a mutable map of mime type to the charset assumed for that content type.
     * An assumed encoding is used by when encoding/decoding a stream, but is not
     * explicitly set in any metadata (eg Content-Type).
     *
     * @return Map of mime type to charset
     */
    public static Map<String, String> getAssumedEncodings()
    {
        return __assumedEncodings;
    }

Note both of these are different to a charset explicitly set in a content type (eg "text/plain;charset=UTF16").
An assumed charset is for "text/json", ie there is a charset there, but we don't need to explicitly set it in the content type.
An inferred charset are more like defaults for particular mime types. In encoding.properties we have:

# Mapping of mime type to inferred or assumed charset
# inferred charsets are used for encoding/decoding and explicitly set in Content-Type
# assumed charsets are used for encoding/decoding, but are not set in Content-Type
# In this file, assumed charsets are indicated with a leading '-'

text/html=utf-8
text/plain=iso-8859-1
text/xml=utf-8
application/xhtml+xml=utf-8
text/json=-utf-8
application/json=-utf-8
application/vnd.api+json=-utf-8

But in Response we have:

    private enum EncodingFrom
    {
        NOT_SET, INFERRED, SET_LOCALE, SET_CONTENT_TYPE, SET_CHARACTER_ENCODING
    }

Which suggests that INFERRED is different from SET_CONTENT_TYPE. So does a call to setContentType("text/json") result in INFERRED or SET_CONTENT_TYPE ? The charset is assumed in this case, so at the very least I think we have mixed up our language.

When we get the writer we do:

            //first try explicit char encoding
            String encoding = _characterEncoding;

            //try char set from mime type
            if (encoding == null)
            {
                if (_mimeType != null && _mimeType.isCharsetAssumed())
                    encoding = _mimeType.getCharsetString();
            }

            //try char set assumed from content type
            if (encoding == null)
            {
                encoding = MimeTypes.getCharsetAssumedFromContentType(_contentType);
            }
            
            //try char set inferred from content type
            if (encoding == null)
            {
                encoding = MimeTypes.getCharsetInferredFromContentType(_contentType);
                setCharacterEncoding(encoding, EncodingFrom.INFERRED);
            }
            
            //try any default char encoding for the context
            if (encoding == null)
            {
                Context context = _channel.getRequest().getContext();
                if (context != null)
                    encoding = context.getResponseCharacterEncoding();
            }
            
            //fallback to last resort iso-8859-1
            if (encoding == null)
            {
                encoding = StringUtil.__ISO_8859_1;
                setCharacterEncoding(encoding, EncodingFrom.INFERRED);
            }

So I think we are probably doing the right thing.... but perhaps our language is a bit mangled??? So some fresh eyes looking over this would be useful to at least improve the javadoc.

The text was updated successfully, but these errors were encountered:

gregw · 2020-12-06T10:52:18Z

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Review Inferred vs Assumed charsets #5757

Review Inferred vs Assumed charsets #5757

gregw commented Dec 6, 2020

gregw commented Dec 6, 2020

Review Inferred vs Assumed charsets #5757

Review Inferred vs Assumed charsets #5757

Comments

gregw commented Dec 6, 2020

gregw commented Dec 6, 2020