Optionally require a valid content type for all rest requests with content #22691

jaymode · 2017-01-19T13:52:59Z

This change adds a strict mode for xcontent parsing on the rest layer. The strict mode will be off by default for 5.x and in a separate commit will be enabled by default for 6.0. The strict mode, which can be enabled by setting http.content_type.required: true in 5.x, will require that all incoming rest requests have a valid and supported content type header before the request is dispatched. In the non-strict mode, the Content-Type header will be inspected and if it is not present or not valid, we will continue with auto detection of content like we have done previously.

The content type header is parsed to the matching XContentType value with the only exception being for plain text requests. This value is then passed on with the content bytes so that we can reduce the number of places where we need to auto-detect the content type.

As part of this, many transport requests and builders were updated to provide methods that
accepted the XContentType along with the bytes and the methods that would rely on auto-detection have been deprecated.

In non strict mode, deprecation warnings are issued whenever a request with body doesn't provide the Content-Type header.

See #19388

This change enforces that all incoming rest requests have a valid and supported content type header before the request is dispatched. The content type header is parsed to the matching XContentType value with the only exception being for plain text requests. This value is then passed on with the content bytes so that we can reduce the number of places where we need to autodetect the content type. As part of this, many transport requests and builders were updated to provide methods that accepted the XContentType along with the bytes and the methods that would rely on autodetection have been deprecated. Closes elastic#19388

javanna

thanks a lot for working on this @jaymode . I did a first review round and left a bunch of comments. I will need to do another round another time as my eyes are crossing now :) looks good though. I am super happy that we are fixing this.

javanna · 2017-01-19T16:15:14Z

...c/main/java/org/elasticsearch/action/admin/cluster/storedscripts/PutStoredScriptRequest.java

+        return xContentType;
+    }
+
+    @Deprecated


shall we add javadocs with the motivation for the deprecation and what it is replaced with?
I assume that these methods can be removed in master once this PR is backported to 5.x?

Yes this can go away in master after backporting.

javanna · 2017-01-19T17:11:47Z

...c/main/java/org/elasticsearch/action/admin/cluster/storedscripts/PutStoredScriptRequest.java

    }

    @Override
    public String toString() {
        String sSource = "_na_";
        try {
-            sSource = XContentHelper.convertToJson(script, false);
+            if (xContentType == null) {


can you clarify that this if is only needed for bw comp and should go away once the bw comp layer is removed?

I'm actually going to remove this and require a non-null xContentType.

javanna · 2017-01-19T17:13:41Z

...java/org/elasticsearch/action/admin/cluster/storedscripts/PutStoredScriptRequestBuilder.java

@@ -40,9 +41,14 @@ public PutStoredScriptRequestBuilder setId(String id) {
        return this;
    }

+    @Deprecated


javanna · 2017-01-19T17:31:54Z

core/src/main/java/org/elasticsearch/action/admin/indices/mapping/put/PutMappingRequest.java

    public PutMappingRequest source(String mappingSource) {
-        this.source = mappingSource;
+        return source(mappingSource, XContentFactory.xContentType(mappingSource));


not something that you changed but maybe something to fix as a followup, we have 4 xContentTypes, but given that source is a string, we only support yaml or json here. I wonder if source should rather be a BytesReference? I never thought we'd support yaml here though :) who knows if anybody is using that format. I also wonder if instead of converting here we should keep around the xContentType and carry it around all the way to DocumentMapperParser#parse where we actually parse it. That may be a bigger effort but sounds cleaner to me.

Sorry, the way I suggested to go before is totally different from what I am saying in this second review. That previous direction was probably the most correct but had a high cost (bw comp mainly). It's the only one that doesn't require any conversion, but just parsing in MapperService once we know to content type. We can still go that way if we really want to... but in most of the cases we get json so we may end up complicating things to handle edge cases which is never good.

we should also add tests for all these different content types around mappings, otherwise it's all theories. I can look into it as a followup

javanna · 2017-01-19T17:37:19Z

...c/main/java/org/elasticsearch/action/admin/indices/template/put/PutIndexTemplateRequest.java

+        return this;
+    }
+
+    /**
     * The settings to crete the index template with (either json/yaml/properties format).


can you fix this typo while at it? s/crete/create

javanna · 2017-01-19T19:25:42Z

...t/java/org/elasticsearch/action/admin/cluster/storedscripts/PutStoredScriptRequestTests.java

+        storedScriptRequest.script(new BytesArray("{}"), XContentType.JSON);
+
+        assertEquals(XContentType.JSON, storedScriptRequest.xContentType());
+        BytesStreamOutput output = new BytesStreamOutput();


nit maybe use try finally around stream outputs and stream inputs?

javanna · 2017-01-19T19:27:02Z

...t/java/org/elasticsearch/action/admin/cluster/storedscripts/PutStoredScriptRequestTests.java

+        serialized.readFrom(in);
+        assertEquals(XContentType.JSON, storedScriptRequest.xContentType());
+        assertEquals(storedScriptRequest.scriptLang(), serialized.scriptLang());
+        assertEquals(storedScriptRequest.id(), serialized.id());


thanks for testing it. It is some more effort but what we've been doing to test also that when we receive from an older version properly is to deserialize a base64 encoded request that was serialized using the previous version. That is much better than simulating things.

javanna · 2017-01-19T19:28:52Z

...src/test/java/org/elasticsearch/action/admin/indices/mapping/put/PutMappingRequestTests.java

+        request.writeTo(bytesStreamOutput);
+
+        StreamInput in = StreamInput.wrap(bytesStreamOutput.bytes().toBytesRef().bytes);
+        in.setVersion(Version.V_5_0_0);


in all these serialization tests, shall we randomize the version just to make sure that the conditional target the proper versions? you may bump into the fact that some released versions are still marked unreleased hence cannot be used in VersionUtils.randomVersion.

javanna · 2017-01-19T19:30:04Z

...src/test/java/org/elasticsearch/action/admin/indices/mapping/put/PutMappingRequestTests.java

+        // this is hacky but here goes
+        PutMappingRequest request = new PutMappingRequest("foo");
+        String mapping = YamlXContent.contentBuilder().startObject().field("foo", "bar").endObject().string();
+        request.source(mapping, XContentType.JSON); // THIS IS NOT A BUG! Intentionally specifying the wrong type so we serialize it


what practical usecase does this test? user is providing the mapping in yaml but?

javanna · 2017-01-19T19:32:51Z

modules/transport-netty4/src/main/java/org/elasticsearch/http/netty4/Netty4HttpRequest.java

-    }
+    /**
+     * A wrapper of {@link HttpHeaders} that implements a map to prevent copying unnecessarily. This class does not support modifications
+     * and due to the underlying implementation, it performs case insensitive lookups of key to values.


oh nice some other comment of mine above may be wrong then, seems like you have made headers case-insensitive. So does it mean that they we case-sensitive before?

The headers were case-insensitive before. Both the Netty 3 and 4 implementations implemented this.

clintongormley · 2017-01-20T17:26:59Z

@elastic/es-clients this will require the clients to send a Content-Type header

@jaymode if no Content-Type header is received, do we just assume JSON? Otherwise this will impact command line curl statements

jaymode · 2017-01-20T17:37:20Z

if no Content-Type header is received, do we just assume JSON?

We do not assume JSON. This commit requires the header to be present and a known type (JSON, YAML, SMILE, CBOR, Plain text). Curl sends the content-type as application/x-www-form-urlencoded by default (can be overriden with -H 'Content-Type: application/json').I do not think we should blindly accept this value

karmi · 2017-01-20T17:52:20Z

(...) this will require the clients to send a Content-Type header

I think it's trivial to set a default header, but we'll have to think about the "Cat" APIs -- I've tried to set application/json by default once, and of course broke all the _cat endpoints, which started to return JSON instead of text.

On the other hand, there could be a debate whether the "Cat" APIs shouldn't return parsed JSON when you call them from something like JavaScript, Python, ..., instead of plain text.

We do not assume JSON. (... Curl sends the content-type as application/x-www-form-urlencoded by default (...) I do not think we should blindly accept this value.

I might be missing a larger picture here, but do we have grave enough reason to just break the most simple use case when people do curl localhost:9200 on the command line?

jaymode · 2017-01-20T17:57:31Z

I might be missing a larger picture here, but do we have grave enough reason to just break the most simple use case when people do curl localhost:9200 on the command line?

This is my fault as I did not state this clearly. We only require the Content-Type header for a request made to elasticsearch with a body (eg POST with JSON body). So curl localhost:9200 remains unaffected.

Another aspect is that sometimes we allow a GET with a source parameter and currently that doesn't require a header; I did add a source_type parameter that we may want to make required in the future to avoid autodetection of content.

jasontedor · 2017-01-20T18:03:32Z

I think it's trivial to set a default header, but we'll have to think about the "Cat" APIs -- I've tried to set application/json by default once, and of course broke all the _cat endpoints, which started to return JSON instead of text.

I think this is because we use to mistakenly use the Content-Type field as determining the media type of the response. That is no longer the case, we use the Accept header for that now. So, I think that this should not be a problem anymore.

honzakral · 2017-01-20T22:44:47Z

We only require the Content-Type header for a request made to elasticsearch with a body (eg POST with JSON body). So curl localhost:9200 remains unaffected.

but it would still break curl -X PUT localhost:9200/i/t/42 -d '{"hello": "world"}' and other uses of curl like put settings etc (often executed as one-off from the cmdline). That seems like a huge breaking change and downgrade for usability.

Also what would happen with bulk and m* APIs? Those are technically not valid JSON...

I would be 100% for abolishing the auto-detect in favor of assuming json (not trying to detect it) and requiring content-type if it's anything else. How does that sound?

karmi · 2017-01-21T08:24:17Z

We only require the Content-Type header for a request made to elasticsearch with a body (eg POST with JSON body). So curl localhost:9200 remains unaffected.

but it would still break curl -X PUT localhost:9200/i/t/42 -d '{"hello": "world"}' and other uses of curl like put settings etc (often executed as one-off from the cmdline). That seems like a huge breaking change and downgrade for usability.

Yes, that is what I was pointing out. Again, do we have a good enough reason to break things like this?

I would be 100% for abolishing the auto-detect in favor of assuming json (not trying to detect it) and requiring content-type if it's anything else. How does that sound?

+1 Assume JSON by default, require Accept for any other format.

alexbrasetvik · 2017-01-23T11:47:49Z

How new (official) clients do users need to have for this?

javanna · 2017-01-23T12:14:36Z

I think that assuming json when the header is not available is already a huge improvement over auto-detection. We may want to make the header required at some point whenever a body is provided, but we can always do that later, and potentially add deprecation logging for that first and break with the next major version. After all the vast majority of our users use json so that assumption seems about right to me for now.

jaymode · 2017-01-23T12:26:58Z

I think that assuming json when the header is not available is already a huge improvement over auto-detection. We may want to make the header required at some point whenever a body is provided, but we can always do that later, and potentially add deprecation logging for that first and break with the next major version. After all the vast majority of our users use json so that assumption seems about right to me for now.

Doing this doesn't solve the issue the usability issues with curl since the Content-Type header will be sent but will not be a supported value unless specified by a user.

I think the requirement of this header should be removed from this PR, based on the text from RFC 7231:

A sender that generates a message containing a payload body SHOULD
generate a Content-Type header field in that message unless the
intended media type of the enclosed representation is unknown to the
sender. If a Content-Type header field is not present, the recipient
MAY either assume a media type of "application/octet-stream"
([RFC2046], Section 4.5.1) or examine the data to determine its type.

In practice, resource owners do not always properly configure their
origin server to provide the correct Content-Type for a given
representation, with the result that some clients will examine a
payload's content and override the specified type. Clients that do
so risk drawing incorrect conclusions, which might expose additional
security risks (e.g., "privilege escalation"). Furthermore, it is
impossible to determine the sender's intent by examining the data
format: many data formats match multiple media types that differ only
in processing semantics. Implementers are encouraged to provide a
means of disabling such "content sniffing" when it is used.

Assuming a media type of application/octet-stream does not do us much good and the auto-detection is what we want to remove.

javanna · 2017-01-23T12:48:28Z

Doing this doesn't solve the issue the usability issues with curl since the Content-Type header will be sent but will not be a supported value unless specified by a user.

Right, thanks for clarifying. Seems like we don't want to break curl in any case, hence if we do want to remove auto-detection, we may have to make curl work with some hack. If curl is resolved one way or the other, do we want to assume json or always require the header (besides the curl specific case)?

honzakral · 2017-01-23T13:00:07Z

To not break curl, and still be somewhat consistent, would essentially mean to assume json unless another known content-type header value is present.

...or decide (in a different issue) we don't need another formats other than json and completely side step this issue.

javanna · 2017-01-23T13:20:05Z

to assume json unless another known content-type header value is present

I think this is too lenient. I would rather special case curl in some other way and find out about other cases that may need special treatment (hopefully there aren't).

Mpdreamz · 2017-01-23T14:46:54Z

To not break curl, and still be somewhat consistent, would essentially mean to assume json unless another known content-type header value is present.

++ we can document sending any unknown Content-Type will default to application/json and that this may break at anytime if at some point we will actually support that mime type.

This actually also brings its behavior inline with Accept which will revert to application/json for any unsupported output media type.

…' header In future, Elasticsearch will require specifying the format of data which is being sent (eg. when indexing a document). Current versions of Elasticsearch will start to print a warning to the deprecation log. A default value 'application/json' for the 'Content-Type' header has been added to prevent the deprecation messages and to be prepared for the requirement in next major version of Elasticsearch. See: elastic/elasticsearch#22691 Related: * c22ec89 * logstash-plugins/logstash-input-elasticsearch#55 * logstash-plugins/logstash-input-elasticsearch#56 * elastic/elasticsearch#22691 Closes #400

…ntent (#22691) This change adds a strict mode for xcontent parsing on the rest layer. The strict mode will be off by default for 5.x and in a separate commit will be enabled by default for 6.0. The strict mode, which can be enabled by setting `http.content_type.required: true` in 5.x, will require that all incoming rest requests have a valid and supported content type header before the request is dispatched. In the non-strict mode, the Content-Type header will be inspected and if it is not present or not valid, we will continue with auto detection of content like we have done previously. The content type header is parsed to the matching XContentType value with the only exception being for plain text requests. This value is then passed on with the content bytes so that we can reduce the number of places where we need to auto-detect the content type. As part of this, many transport requests and builders were updated to provide methods that accepted the XContentType along with the bytes and the methods that would rely on auto-detection have been deprecated. In the non-strict mode, deprecation warnings are issued whenever a request with body doesn't provide the Content-Type header. See #19388

jaymode · 2017-02-07T15:56:48Z

5.x commit b9b9400

Resolves #37 Relates to elastic/elasticsearch#22691

This commit adds methods to the BulkProcessor that accept bytes and a XContentType to avoid content type detection. The methods that do not accept XContentType with bytes have been deprecated by this commit. Relates elastic#22691

This commit adds methods to the BulkProcessor that accept bytes and a XContentType to avoid content type detection. The methods that do not accept XContentType with bytes have been deprecated by this commit. Relates #22691

This commit fixes communication with 5.3.0 nodes to send XContentType to these nodes since #22691 was backported to the 5.3 branch.

The backport of elastic#22691 caused plain text bodies with a scroll id to fail with an IllegalStateException as the wrong method was being called. This commit adds tests to ensure plain text bodies work and fixes the search scroll action so that it properly handles a request with a plain text body.

The backport of #22691 caused plain text bodies with a scroll id to fail with an IllegalStateException as the wrong method was being called. This commit adds tests to ensure plain text bodies work and fixes the search scroll action so that it properly handles a request with a plain text body.

paladox · 2017-03-29T21:24:55Z

Hi, after upgrading from 5.2.* to 5.3.0 i found that the config http.content_type.required is on by default in 5.x. Doing http.content_type.required: false in the elastic search config file fixed it for me.

pickypg · 2017-03-29T21:34:14Z

@paladox

FYI, the setting defaults to false if it has not been set until 6.0.

https://github.com/elastic/elasticsearch/blob/v5.3.0/core/src/main/java/org/elasticsearch/http/HttpTransportSettings.java#L72

I also downloaded an instance of 5.3.0 to ensure something didn't sneak in and ran:

$ curl localhost:9200/_search -d '{"query":{"match_all":{}}}'

It worked without the content-type specified and no settings adjusted.

paladox · 2017-03-29T21:38:54Z

Oh.

paladox · 2017-03-29T21:41:43Z

We managed to fix it with https://phabricator.wikimedia.org/P5158#27475

pickypg · 2017-03-29T21:59:10Z

@paladox

Ah, so that's a different issue. Your client was technically sending the wrong content-type for the content. Regardless, I'm glad you fixed it by sending the content-type; that puts you all ahead of the curve!

Starting with Elasticsearch 6.0.0 products are required to set Content-Type to application/json. Elasticsearch 5.x already accepts this[1]. Add acceptance tests when we load test data and ensure content type is set to the correct value. Also add ipdb to requirements.txt as a convenience for troubleshooting tests. [1] elastic/elasticsearch#22691

…' header In future, Elasticsearch will require specifying the format of data which is being sent (eg. when indexing a document). Current versions of Elasticsearch will start to print a warning to the deprecation log. A default value 'application/json' for the 'Content-Type' header has been added to prevent the deprecation messages and to be prepared for the requirement in next major version of Elasticsearch. See: elastic/elasticsearch#22691 Related: * c22ec891b0be0e3e8f3c10fd7844adfbd2d608ed * logstash-plugins/logstash-input-elasticsearch#55 * logstash-plugins/logstash-input-elasticsearch#56 * elastic/elasticsearch#22691 Closes #400

jaymode added :Core/Infra/REST API REST infrastructure and utilities >enhancement review v5.3.0 v6.0.0-alpha1 labels Jan 19, 2017

jaymode requested review from javanna and s1monw January 19, 2017 13:52

javanna requested changes Jan 19, 2017

View reviewed changes

jaymode added 3 commits January 20, 2017 07:49

start addressing review feedback

c53e296

review updates for index templates

9448729

use Objects.requireNonNull for pipeline requests

7a267ff

clintongormley added the >breaking label Jan 20, 2017

clintongormley mentioned this pull request Jan 20, 2017

copy-as-curl should include content-type elastic/docs#143

Closed

jaymode added 2 commits January 20, 2017 15:11

deprecate loadFromSource without xcontenttype

3e78549

more deprecations :)

63ab38a

dakrone added a commit to dakrone/es-mode that referenced this pull request Feb 7, 2017

Send Content-Type=application/json in HTTP headers

cb69465

Resolves #37 Relates to elastic/elasticsearch#22691

jaymode mentioned this pull request Feb 9, 2017

Add BulkProcessor methods with XContentType parameter #23078

Merged

jaymode added a commit that referenced this pull request Feb 13, 2017

Fix communication with 5.3.0 nodes

d8d03f4

This commit fixes communication with 5.3.0 nodes to send XContentType to these nodes since #22691 was backported to the 5.3 branch.

jaymode mentioned this pull request Feb 14, 2017

Enforce Content-Type requirement on the rest layer and remove deprecated methods #23146

Merged

jaymode mentioned this pull request Feb 15, 2017

Fix search scroll request with a plain text body #23183

Merged

acchen97 mentioned this pull request Feb 15, 2017

Documenting content-type header integration compatibility elastic/logstash#6718

Closed

spalger mentioned this pull request Jul 28, 2017

Scroll api 5.0, 5.1, 5.2 throwing error against elastic search 5.3+ elastic/elasticsearch-js#573

Closed

cbuescher mentioned this pull request Jan 5, 2018

Ensure PutMappingRequest.buildFromSimplifiedDef input are pairs #19837

Merged

sfalquier mentioned this pull request Jan 8, 2018

[Doc] Update "Java API Administration >> Indices Administration" page #28131

Closed

jaymode mentioned this pull request Jan 9, 2018

406 response when POSTing to _bulk with a charset specified #28123

Closed

Optionally require a valid content type for all rest requests with content #22691

Optionally require a valid content type for all rest requests with content #22691

Conversation

jaymode commented Jan 19, 2017 • edited by javanna Loading

javanna left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

clintongormley commented Jan 20, 2017 • edited Loading

jaymode commented Jan 20, 2017

karmi commented Jan 20, 2017

jaymode commented Jan 20, 2017

jasontedor commented Jan 20, 2017

honzakral commented Jan 20, 2017

karmi commented Jan 21, 2017

alexbrasetvik commented Jan 23, 2017

javanna commented Jan 23, 2017

jaymode commented Jan 23, 2017

javanna commented Jan 23, 2017 • edited Loading

honzakral commented Jan 23, 2017

javanna commented Jan 23, 2017

Mpdreamz commented Jan 23, 2017

jaymode commented Feb 7, 2017

paladox commented Mar 29, 2017

pickypg commented Mar 29, 2017

paladox commented Mar 29, 2017

paladox commented Mar 29, 2017

pickypg commented Mar 29, 2017

jaymode commented Jan 19, 2017 •

edited by javanna

Loading

clintongormley commented Jan 20, 2017 •

edited

Loading

javanna commented Jan 23, 2017 •

edited

Loading