Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[http] add llhttp parser implementation #15814

Closed
wants to merge 34 commits into from
Closed

Conversation

asraa
Copy link
Contributor

@asraa asraa commented Apr 1, 2021

Signed-off-by: Asra Ali asraa@google.com

Commit Message: Add llhttp parser implementation.
Risk Level: High, but off by default.
Testing: H/1 codec impl and unit tests that use actual codecs have two targets now, a legacy and new version. Integration tests are parametrized by enabling the runtime flag via the config setting and is enabled in compile_time_options. Codec fuzz test enables the new parser.
Release Notes: Added
Runtime guard: envoy.reloadable_features.enable_new_http1_parser. Off by default.
Fixes: #5155
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=25777 (see note below)

CHANGES

  • We use llhttp to detect and reject invalid header characters. Thus, http1.invalid_characters response codes appear as http1.codec_error. and the status message reads HPE_INVALID_HEADER_TOKEN.
  • It also appears that llhttp handles invalid methods errors after message begin, whereas http-parser handles it at message begin. I only noticed this when it triggered a codec fuzzer failure when a second H/1 stream started. http-parser catches the invalid method failure and exits early, whereas llhttp hits the latent bug: there were no stream resets when responses were completed before requests are finished. In the normal case, we expect higher level code to reset the stream.

NOTES FOR CLEANUP

  • Our invalid character check in the codec can be removed when the legacy code is removed.
  • The secondary impl layer can be removed when the legacy code is removed. Actually, the parser interface should totally be removed since there's a slight performance dip from using it.

Co-authored-by: Derek Argueta <darguetap@gmail.com>
Signed-off-by: Asra Ali <asraa@google.com>
@repokitteh-read-only repokitteh-read-only bot added the deps Approval required for changes to Envoy's external dependencies label Apr 1, 2021
@repokitteh-read-only
Copy link

CC @envoyproxy/dependency-shepherds: Your approval is needed for changes made to (bazel/.*repos.*\.bzl)|(bazel/dependency_imports\.bzl)|(api/bazel/.*\.bzl)|(.*/requirements\.txt)|(.*\.patch).

🐱

Caused by: #15814 was opened by asraa.

see: more, trace.

Signed-off-by: Asra Ali <asraa@google.com>
@moderation
Copy link
Contributor

/lgtm deps

@repokitteh-read-only repokitteh-read-only bot removed the deps Approval required for changes to Envoy's external dependencies label Apr 2, 2021
Signed-off-by: Asra Ali <asraa@google.com>
@asraa asraa changed the title WIP add llhttp parser implementation [http] add llhttp parser implementation Apr 2, 2021
@asraa
Copy link
Contributor Author

asraa commented Apr 5, 2021

Very confused about the ASan error. It is reproducible via bazel test, but not by gdb or running the binary directly.

When I bazel test, it appears the function doesn't enter llhttp_execute at all but somehow returns from it causing undefined behavior using error after.

error = llhttp_execute(&parser_, slice, len);
}
size_t nread = len;
// Adjust number of bytes read in case of error.
if (error != HPE_OK) {

Signed-off-by: Asra Ali <asraa@google.com>
@repokitteh-read-only repokitteh-read-only bot added the deps Approval required for changes to Envoy's external dependencies label Apr 5, 2021
Copy link
Contributor

@antoniovicente antoniovicente left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is really exciting to see progress on Envoy moving to a http1 parser that is under active development.

source/common/http/http1/parser_impl.cc Outdated Show resolved Hide resolved
test/integration/base_integration_test.cc Show resolved Hide resolved
@@ -766,6 +771,7 @@ StatusOr<ParserStatus> ConnectionImpl::onHeadersComplete() {
return codecProtocolError("http/1.1 protocol error: unsupported transfer encoding");
}
}
parser_->hasContentLength(request_or_response_headers.ContentLength() != nullptr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if a request has both transfer-encoding: chunked and content-length?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/nodejs/llhttp/blob/aa07189d39b8d0da9d5a887d93761014a45804bf/test/request/content-length.md#error-on-simultaneous-content-length-and-transfer-encoding-identity

It's handled with an error. Actually this is just because llhttp doesn't have an initial value for unset content length. http_parser uses all 1 bits set to indicate there is no content length. Added comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have tests in envoy to cover these cases. This is related to my question regarding use of lenient parsing further down in this file. It seems that those lenient options were added for the benefit of Envoy's H1 implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TEST_F(Http1ServerConnectionImplTest, TestSmugglingDisallowChunkedContentLength0) {

There are tests for both content length and chunked.

source/common/http/http1/parser_impl.cc Outdated Show resolved Hide resolved
source/common/http/http1/parser_impl.cc Outdated Show resolved Hide resolved
void resume() { llhttp_resume(&parser_); }

ParserStatus pause() {
// llhttp can only pause by returning a paused status in user callbacks.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we do something to reconcile this behavior with http_parser? How do we ensure that nothing depends on this function having side effects for http_parser?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The http_parser implementation of pause() uses the http parser method to pause (https://github.com/nodejs/http-parser/blob/ec8b5ee63f0e51191ea43bb0c6eac7bfbff3141d/http_parser.h#L438) and an OK. That's the same as returning pause from llhttp parser. I think it's already reconciled

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is an area to cleanup once llhttp is the only implementation.

In the current world I would consider something like:
ParserStatus maybePause(ParserStatus status);

The http_parser version would detect status == Paused and call its pause method and return Success, while the llhttp version would just return the status that was passed in.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current world I would consider something like:

To me that seems strange because Paused is the only status that maybePause would ever call.

Technically returning HPE_PAUSED from http-parser will also cause the parser to pause, but I don't trust making the change because there's very little http-parser docs and I don't know if it changes how it handles errors. (All http_parser_pause() does is cause HPE_PAUSED to be returning from execute. we could just return it directly).

source/common/http/http1/parser_impl.h Show resolved Hide resolved
source/common/http/http1/parser_impl.h Show resolved Hide resolved
test/integration/base_integration_test.cc Show resolved Hide resolved
bazel/repository_locations.bzl Outdated Show resolved Hide resolved
Signed-off-by: Asra Ali <asraa@google.com>
@indutny
Copy link

indutny commented Apr 6, 2021

I would appreciate if you could upgrade llhttp to 5.1.0 or 4.1.0, which was just released: nodejs/llhttp#95

asraa added 2 commits April 10, 2021 17:32
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
@asraa
Copy link
Contributor Author

asraa commented Apr 11, 2021

Updated, sorry for the delay! (This is an evening thing right now)

  • For unit test dupes: only mixed_conn_pool test used an actual parser (I added an ASSERT(false) in the default parser and ran all the http/... conn_pool/... to grab failures).
  • A latent codec fuzz bug needed fixing, I noted why it came up in the PR description.
  • llhttp version's bumped.
  • Let me know what you think about the lenient chunked length setting. I switched over to llhttp's strict header validation since Envoy no longer provides the option to disable that.

Signed-off-by: Asra Ali <asraa@google.com>
@asraa
Copy link
Contributor Author

asraa commented Apr 12, 2021

Still working on the asan issue (comment above) and a tsan one. TSan is from running existing parser, reproducible, so something must have went wrong there... all the issues are about allowing chunked... and run into data race at destruction

public:
Impl(llhttp_type_t type) {
llhttp_init(&parser_, type, &settings_);
llhttp_set_lenient_chunked_length(&parser_, 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What code in Envoy is generating the error response when both transfer-encoding: chunked and content-length is set?

@@ -0,0 +1,62 @@
actions {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What fixes are needed to make this corpus entry run successfully? I assume it's the changes to the fuzz harness?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does run successfully now, with the changes in the diff that close that reset stream when the response ends before the request does.

test/common/http/codec_impl_fuzz_test.cc Outdated Show resolved Hide resolved
Signed-off-by: Asra Ali <asraa@google.com>
@antoniovicente
Copy link
Contributor

I tried to do some debugging but the conclusion I got to was along the lines that the stall happens even if I disable the cache filter. This may be a real issue processing multiple requests on the same connection with the new parser. Is there a chance that some state is not being reset in the parser between requests?

Debugging the state of the llhttp parser isn't easy between the typescript to c transcoding and it being an external dep for the build. Would it be possible to add methods to dump info about the state of the parser or debug accessors which can be used to implement said debug methods in ParserImpl?

@moderation
Copy link
Contributor

asraa added 2 commits June 17, 2021 16:18
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
@antoniovicente
Copy link
Contributor

Please see precheck format presubmit error.

/wait

021-06-17T20:30:38.8262526Z ERROR: ./docs/root/version_history/current.rst:82: Version history not in alphabetical order (crash support vs admission control): please check placement of line
2021-06-17T20:30:38.8264348Z  * admission control: added :ref:`admission control <envoy_v3_api_field_extensions.filters.http.admission_control.v3alpha.AdmissionControl.rps_threshold>` option that when average RPS of the sampling window is below this threshold, the filter will not throttle requests. Added :ref:`admission control <envoy_v3_api_field_extensions.filters.http.admission_control.v3alpha.AdmissionControl.max_rejection_probability>` option to set an upper limit on the probability of rejection.. 

Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
@antoniovicente
Copy link
Contributor

cache filter integration test still failing

/wait

@moderation
Copy link
Contributor

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@github-actions github-actions bot added the stale stalebot believes this issue/PR has not been touched recently label Jul 28, 2021
@yanavlasov yanavlasov added no stalebot Disables stalebot from closing an issue and removed stale stalebot believes this issue/PR has not been touched recently labels Jul 29, 2021
@antoniovicente antoniovicente removed their assignment Aug 9, 2021
@moderation
Copy link
Contributor

@yanavlasov
Copy link
Contributor

Obsolete. QUICHE H/1 code (Balsa) is currently being integrated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deps Approval required for changes to Envoy's external dependencies no stalebot Disables stalebot from closing an issue waiting
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Switch from http-parser to llhttp
8 participants