Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Callbacks must not be invoked from within "resume_http_request()" #43

Closed
SvetlinZarev opened this issue Oct 30, 2020 · 5 comments · Fixed by envoyproxy/envoy#13840
Closed

Comments

@SvetlinZarev
Copy link

SvetlinZarev commented Oct 30, 2020

Basically all response callbacks are invoked from within resume_http_request() which results in multiple mutable borrows error:

[Envoy (Epoch 0)] [2020-10-30 11:44:15.052][30][critical][wasm] [external/envoy/source/extensions/common/wasm/context.cc:1098] wasm log connectivity_plugin csp_root_id: panicked at 'already borrowed: BorrowMutError', /Users/svetlin/git/oss/proxy-wasm-rust-sdk/src/dispatcher.rs:335:67

This happens when I try to resume the request from a Context::on_http_call_response() callback. This callback mutably borrows the self.http_streams member. Then when I resume the request via resume_http_request(), any subsequent callbacks such as on_http_call_response, on_done, and others, result in a second mutable borrows which ends up in a panic!(), thus crashing envoy.

PS: Failed on istio 1.5.10

@gbrail
Copy link
Contributor

gbrail commented Oct 30, 2020

I can confirm this and I have faced it myself. There are two possible ways to fix this.

One way would be to change proxy_wasm_cpp_host so that it does not deliver any WASM callbacks in a nested way.

The second way would be to change the way that this SDK handles the various "resume" calls so that it somehow marks the the callback needs resuming, exits the wrapper that handles the static lookup of the context hash table, and then calls resume outside that block so that it will not trigger a callback.

PiotrSikora added a commit to PiotrSikora/envoy that referenced this issue Oct 30, 2020
Fixes proxy-wasm/proxy-wasm-rust-sdk#43.

Signed-off-by: Piotr Sikora <piotrsikora@google.com>
@PiotrSikora
Copy link
Member

Thanks for the report! Reentrant calls are actually a bug in Envoy implementation, not the SDK.

Could you please verify that this fixes the issue for you? envoyproxy/envoy#13840

Do you have a minimal reproducer?

Note that examples/http_auth_random.rs uses resume_http_request(), and it seems to work fine with and without that fix.

@SvetlinZarev
Copy link
Author

SvetlinZarev commented Nov 3, 2020

I've tried to create a reproducible example but failed to so. I do not know if this is helpful, but here is the part of my plugin that causes the issue. This essentially my on_http_call_response()

    
    // Called directly from Context::on_http_call_response
    fn on_connection_configuration(&mut self, body_size: usize) {
        log::info!("> on_connection_configuration");

        let sc = match self.get_status_code() {
            None => {
                self.send_internal_server_error();
                return;
            }

            Some(sc) => sc,
        };

        if sc == HTTP_STATUS_CODE_NOT_FOUND {
            log::info!("Did not find connection configuration - nothing to configure");

            log::info!("Resuming request...");
            self.resume_http_request(); // >------------------> Results in a crash !!!
            log::info!("Request resumed...");
            return;
        }

        if sc != HTTP_STATUS_CODE_OK {
            log::error!("Unexpected response from the configuration service: {}", sc);
            self.send_internal_server_error();
            return;
        }

        self.set_http_request_header(HEADER_TO_DELETE, None);

        let configuration = self.extract_connection_configuration(body_size);
        log::info!("Connection configuration: {:#?}", configuration);

        match configuration {
            None => {
                log::error!("Failed to parse configuration data!! Aborting request");
                self.send_internal_server_error();
            }

            Some(configuration) => {
                log::info!("Configuring request...");

                let auth_headers = self.extract_auth_headers(&configuration);
                if let Some(auth_headers) = auth_headers {
                    for (k, v) in auth_headers {
                        log::info!("Setting request header: {}: {}", k, v);
                        self.set_http_request_header(k, Some(v));
                    }
                }

                let some_header = self.extract_location_id(&configuration);
                self.set_http_request_header(HEADER_SOME_HEADER, some_header);

                log::info!("Resuming request...");
                self.resume_http_request(); // >------------------> Works fine !!!
                log::info!("Request resumed...");
            }
        }
    }

There are two cases when I resume the request - the first one results in a crash due to invoking the response callbacks from withing the method. The other one works fine.

I've patched the sdk to avoid the borrowing issues and here is the log from the NON-working case:

nsions/common/wasm/context.cc:1109] wasm log csp_root_id: Borrowing streams: on_http_call_response
2020-11-03T15:58:53.232849Z	info	envoy wasm	[external/envoy/source/extensions/common/wasm/context.cc:1103] wasm log connectivity_plugin csp_root_id: > on_http_call_response
2020-11-03T15:58:53.232904Z	info	envoy wasm	[external/envoy/source/extensions/common/wasm/context.cc:1103] wasm log connectivity_plugin csp_root_id: > on_connection_configuration
2020-11-03T15:58:53.232913Z	info	envoy wasm	[external/envoy/source/extensions/common/wasm/context.cc:1103] wasm log connectivity_plugin csp_root_id: Received status code '404'
2020-11-03T15:58:53.232918Z	info	envoy wasm	[external/envoy/source/extensions/common/wasm/context.cc:1103] wasm log connectivity_plugin csp_root_id: Did not find connection configuration - skipping configuration
2020-11-03T15:58:53.232922Z	info	envoy wasm	[external/envoy/source/extensions/common/wasm/context.cc:1103] wasm log connectivity_plugin csp_root_id: Resuming request...
2020-11-03T15:58:53.232963Z	error	envoy wasm	[external/envoy/source/extensions/common/wasm/context.cc:1109] wasm log connectivity_plugin csp_root_id: Patched to avoid borrow: on_http_response_headers
2020-11-03T15:58:53.232998Z	error	envoy wasm	[external/envoy/source/extensions/common/wasm/context.cc:1109] wasm log connectivity_plugin csp_root_id: Patched to avoid borrow: on_http_response_body
2020-11-03T15:58:53.233015Z	error	envoy wasm	[external/envoy/source/extensions/common/wasm/context.cc:1109] wasm log connectivity_plugin csp_root_id: Patched to avoid borrow: on_done
2020-11-03T15:58:53.233037Z	info	envoy wasm	[external/envoy/source/extensions/common/wasm/context.cc:1103] wasm log connectivity_plugin csp_root_id: Request resumed..

And here is the log from the working case:

2020-11-03T16:01:11.298445Z	info	envoy wasm	[external/envoy/source/extensions/common/wasm/context.cc:1103] wasm log connectivity_plugin csp_root_id: > on_http_call_response
2020-11-03T16:01:11.298552Z	info	envoy wasm	[external/envoy/source/extensions/common/wasm/context.cc:1103] wasm log connectivity_plugin csp_root_id: > on_connection_configuration
2020-11-03T16:01:11.298561Z	info	envoy wasm	[external/envoy/source/extensions/common/wasm/context.cc:1103] wasm log connectivity_plugin csp_root_id: Received status code '200'
2020-11-03T16:01:11.299066Z	info	envoy wasm	[external/envoy/source/extensions/common/wasm/context.cc:1103] wasm log connectivity_plugin csp_root_id: Connection configuration: Some(....)
2020-11-03T16:01:11.299093Z	info	envoy wasm	[external/envoy/source/extensions/common/wasm/context.cc:1103] wasm log connectivity_plugin csp_root_id: Configuring request...
2020-11-03T16:01:11.299106Z	info	envoy wasm	[external/envoy/source/extensions/common/wasm/context.cc:1103] wasm log connectivity_plugin csp_root_id: Setting request header: ...
2020-11-03T16:01:11.299128Z	info	envoy wasm	[external/envoy/source/extensions/common/wasm/context.cc:1103] wasm log connectivity_plugin csp_root_id: Resuming request...
2020-11-03T16:01:11.299353Z	info	envoy wasm	[external/envoy/source/extensions/common/wasm/context.cc:1103] wasm log connectivity_plugin csp_root_id: Request resumed...
2020-11-03T16:01:11.299366Z	error	envoy wasm	[external/envoy/source/extensions/common/wasm/context.cc:1109] wasm log connectivity_plugin csp_root_id: Releasing streams: on_http_call_response
2020-11-03T16:01:11.461774Z	error	envoy wasm	[external/envoy/source/extensions/common/wasm/context.cc:1109] wasm log connectivity_plugin csp_root_id: Patched to avoid borrow: on_http_response_headers
2020-11-03T16:01:11.463100Z	error	envoy wasm	[external/envoy/source/extensions/common/wasm/context.cc:1109] wasm log connectivity_plugin csp_root_id: Patched to avoid borrow: on_http_response_body
2020-11-03T16:01:11.463144Z	error	envoy wasm	[external/envoy/source/extensions/common/wasm/context.cc:1109] wasm log connectivity_plugin csp_root_id: Patched to avoid borrow: on_http_response_body
2020-11-03T16:01:11.463158Z	error	envoy wasm	[external/envoy/source/extensions/common/wasm/context.cc:1109] wasm log connectivity_plugin csp_root_id: Patched to avoid borrow: on_done

Take a look at the Resuming request and the Request resumed logs. I cannot understand why the resume_http_request() behaves differently.

@gbrail
Copy link
Contributor

gbrail commented Nov 3, 2020

I think that this can be reproduced using http_auth_random if we make a few changes.

First of all, http_auth_random never fails authentication if there is a body (if you send it a POST) because it always returns "Continue" from the http_body callback, and Envoy calls the body callback even though the header callback returned "Pause." You can observe this by send http_auth_random a POST instead of a GET.

The best way to fix this, IMO, is to first change the SDK so that it is possible to return "StopAllIterationAndWatermark," which pauses all filter execution until "continueEncoding" is called, and then return that from the example. I have this change in a PR in this repo that would make this possible.

If we make that change, and then send an HTTP POST -- then it'll fail with the BorrowMutError half the time. (The other half it'll return an error.)

Sorry that all is complicated -- I can send a PR if it helps!

@PiotrSikora
Copy link
Member

@gbrail Thanks, that was really helpful! I somehow missed the fact that Envoy will start processing HTTP body even after receiving Pause from HTTP headers.

As mentioned before, the primary issue is the reentrant call from Envoy (see: envoyproxy/envoy#13840), but the behavior you've mentioned is also extremely problematic, so I'm going to force StopAllIterationAndWatermark when returning Pause from HTTP headers (see: proxy-wasm/proxy-wasm-cpp-host#95).

PiotrSikora added a commit to PiotrSikora/envoy that referenced this issue Nov 20, 2020
istio-testing pushed a commit to istio/envoy that referenced this issue Nov 20, 2020
* build: update rules_rust to allow Rustc in RBE (envoyproxy#13595)

Signed-off-by: Lizan Zhou <lizan@tetrate.io>
Signed-off-by: Piotr Sikora <piotrsikora@google.com>

* fix macos v8 build (envoyproxy#13572)

Signed-off-by: Rama Chavali <rama.rao@salesforce.com>

* wasm: update proxy-wasm-cpp-host (envoyproxy#13606)

The PR updates proxy-wasm-cpp-host dependency for enhancing the capability and fixing a bug in WASM extensions.

The change consists of three PRs in proxy-wasm-cpp-host:

1. proxy-wasm/proxy-wasm-cpp-host#68 @PiotrSikora
2. proxy-wasm/proxy-wasm-cpp-host#65 @mathetake (me)
3. proxy-wasm/proxy-wasm-cpp-host#64 @mathetake (me)

The code change can be found at proxy-wasm/proxy-wasm-cpp-host@49ed20e...c5658d3 .

1 & 2 enhance WASM capability, and 3 fixes a bug in situations where users share vm_id for multiple filters. For details, please take a look at these original PRs.

Signed-off-by: mathetake <takeshi@tetrate.io>
Signed-off-by: Piotr Sikora <piotrsikora@google.com>

* wasm: re-enable tests with precompiled modules. (envoyproxy#13583)

Fixes envoyproxy#12335.

Signed-off-by: Piotr Sikora <piotrsikora@google.com>

* wasm: flip the meaning of the "repository" in envoy_wasm_cc_binary(). (envoyproxy#13621)

Change the meaning of the "repository" parameter to refer to an external
Bazel repository, instead of using "@envoy" in targets that are included
in the Envoy repository.

This aligns with other envoy_* rules.

Signed-off-by: Piotr Sikora <piotrsikora@google.com>

* build: support ppc64le with wasm (envoyproxy#13657)

The build has only been tested with gn git sha 5da62d5 as
recommended by ppc64 maintainers of the v8 runtime.

Signed-off-by: Christopher M. Luciano <cmluciano@us.ibm.com>

* wasm: remove no longer needed Emscripten metadata. (envoyproxy#13667)

Signed-off-by: Piotr Sikora <piotrsikora@google.com>

* fix wasm compilation (envoyproxy#13765)

Signed-off-by: Asra Ali <asraa@google.com>

* wasm: strip only Custom Sections with precompiled Wasm modules. (envoyproxy#13775)

Signed-off-by: Piotr Sikora <piotrsikora@google.com>

* build: don't build shared libraries for zlib and zlib-ng. (envoyproxy#13652)

Signed-off-by: Piotr Sikora <piotrsikora@google.com>

* wasm: update V8 to v8.7.220.10. (envoyproxy#13568)

Signed-off-by: Piotr Sikora <piotrsikora@google.com>

* build: exclude wee8/out from inputs (envoyproxy#13866)

If you build without sandboxing for performance, the output files from
this custom build genrule contained timestamps which caused it to
rebuild every single build.

Signed-off-by: Keith Smiley <keithbsmiley@gmail.com>

* tls: fix detection of the upstream connection close event. (envoyproxy#13858)

Fixes envoyproxy#13856.

Signed-off-by: Piotr Sikora <piotrsikora@google.com>

* wasm: Force stop iteration after local response is sent (envoyproxy#13930)

Resolves envoyproxy#13857

ref:
-proxy-wasm/proxy-wasm-rust-sdk#44
-proxy-wasm/proxy-wasm-cpp-host#88
-proxy-wasm/proxy-wasm-cpp-host#93

Signed-off-by: mathetake <takeshi@tetrate.io>
Signed-off-by: Piotr Sikora <piotrsikora@google.com>

* wasm: fix order of callbacks for paused requests. (envoyproxy#13840)

Fixes proxy-wasm/proxy-wasm-rust-sdk#43.

Signed-off-by: Piotr Sikora <piotrsikora@google.com>

* wasm: fix network leak (envoyproxy#13836)

Signed-off-by: Kuat Yessenov <kuat@google.com>

Co-authored-by: Lizan Zhou <lizan@tetrate.io>
Co-authored-by: Rama Chavali <rama.rao@salesforce.com>
Co-authored-by: Takeshi Yoneda <yoneda.takeshi.md@alumni.tsukuba.ac.jp>
Co-authored-by: cmluciano <cmluciano@us.ibm.com>
Co-authored-by: asraa <asraa@google.com>
Co-authored-by: Keith Smiley <keithbsmiley@gmail.com>
Co-authored-by: Takeshi Yoneda <takeshi@tetrate.io>
Co-authored-by: Kuat <kyessenov@users.noreply.github.com>
rexengineering pushed a commit to rexengineering/istio-envoy that referenced this issue Oct 15, 2021
Fixes proxy-wasm/proxy-wasm-rust-sdk#43.

Signed-off-by: Piotr Sikora <piotrsikora@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants