Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return timing data in response header(s) #438

Merged
merged 11 commits into from
Jun 23, 2023
Merged

Conversation

onpaws
Copy link
Contributor

@onpaws onpaws commented Jun 15, 2023

Seeks to implement #357

  • Minimal implementation that seeks to measure "the interval between when Seafowl finished reading the HTTP request body, and when it started writing the HTTP response body".
  • Return that number inside X-Seafowl-Runtime (currently hardcoded)
  • Do we want the user to be able to customize (e.g. full control of header name and it's contents?) UPDATE 20230619: not at this time
  • Decide how that should be configurable e.g. via env var/seafowl.toml, or at the INFO/DEBUG level, or some other way
  • Add test for uncached flow
  • Add test for cached flow
  • Assuming we want to generalize this, consider which, if any existing standards [1][2] might make sense for Seafowl UPDATE 20230619: not intending to generalize in this PR, but in future maybe

[1] https://www.w3.org/TR/server-timing/
[2] https://arrow.apache.org/datafusion/user-guide/sql/explain.html

@@ -53,6 +53,7 @@ const BEARER_PREFIX: &str = "Bearer ";
// We have a very lax CORS on this, so we don't mind browsers
// caching it for as long as possible.
const CORS_MAXAGE: u32 = 86400;
const RUNTIME_HEADER: &str = "X-Seafowl-Runtime";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Runtime sounds weird to me in this context, maybe something like:

Suggested change
const RUNTIME_HEADER: &str = "X-Seafowl-Runtime";
const QUERY_TIME_HEADER: &str = "X-Seafowl-Query-Time";

Also, there's the matter of CORS-safelisted headers that @milesrichardson raised in the original issue. I'm fine with having this info be in a separate header (i.e. not in Content-Type), but we'd probably need to specify Access-Control-Expose-Headers as well then.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unfortunate thing about needing to specify Access-Control-Expose-Headers is that there can only be one header Access-Control-Expose-Headers, with a value containing a comma-separated list of exposed headers. That means that if we have two areas of the codebase that each add their own header, then they need to somehow be aware of the current value when merging theirs into the list - which makes for messier code. Also, if someone is running a proxy in front of Seafowl, then they need to perform similar logic if they want to add their own headers to the list.

@gruuya
Copy link
Contributor

gruuya commented Jun 19, 2023

Fwiw, I don't think this should be configurable (not for now at least); there are more pertinent knobs that still aren't customizable (e.g. see #50), and I'm wary of bloating up the config with less important items.

I think just extending the existing tests should be sufficient to merge this.

@onpaws
Copy link
Contributor Author

onpaws commented Jun 19, 2023

Fwiw, I don't think this should be configurable (not for now at least); there are more pertinent knobs that still aren't customizable (e.g. see #50), and I'm wary of bloating up the config with less important items.

I think just extending the existing tests should be sufficient to merge this.

Thanks Marko, yes probably too soon to add new config options. In the future depending how we want to allow user customization e.g. once we return >1 timing measurement, it will presumably be time to consider.

I will switch to X-Seafowl-Query-Time, unless anyone objects, thanks!
In terms of answering "when should Seafowl return this header", it seems to me we have two options:

  1. Always
  2. Return only in certain conditions, e.g. if the user-agent passes e.g. Seafowl-Trace: true in the request

IIUC I think @mildbyte prefers option 1, and @milesrichardson prefers option 2. (at least, I had that impression based on MM chats.)

What do you think @gruuya?

- Adopt `X-Seafowl-Runtime` header
- TODO: make it configurable
- TODO: confirm that this is measuring what we want it to measure
- Decided to stick with one metric for now, so rename accordingly
- Improve safety by wrapping with Option
- Apparently old Rust versions could panic when
  time "went backwards" in the OS, hopefully
  using types will help safeguard against that.
- Optionally return a raw .elapsed(), or use
  formatted_elapsed() which is suitable for headers
  (Probably wasteful to return `ms` in every response)
- Add a Default attribute
- Accept linter changes
@milesrichardson
Copy link

milesrichardson commented Jun 20, 2023

I would prefer option 1 (always) too. But just note that it won't always be available to client code in a cross-origin browsing context unless the header is included in the CSV value of Access-Control-Expose-Headers.

But I'm hesitant to always return Access-Control-Expose-Headers. Perhaps that could be opt-in via a request flag. Or perhaps it's not a problem to always include it anyway, since any problem it would cause for someone running a proxy in front of Seafowl, would only be relevant when they wanted the header anyway - at which point they could add it to their own code where they specify exposed headers (that is, the relatively harmless failure case would be their own headers overriding a Seafowl header that's meant to be opt-in anyway).

I had originally said that the timing data shouldn't go into the Content-Type header alongside the schema. And it certainly feels like it would be overloading the semantic purpose of that header. But, after thinking more on it, I'm not convinced it's a problem to include timing data in addition to the schema. At first I thought it would be wrong because you don't want to cache it, but actually it makes sense to cache it, because it's timing how long it took for the origin Seafowl instance to execute the original query and return the result that was ultimately saved to the cache. So it's totally sensible for the timing data to be included in the cached response that's served by Cloudflare, or even from your browser, since neither of those are re-executing the query, and the time Seafowl originally spent executing the query is still a relevant metric.

So another question would be: what response headers does Cloudflare save in its cached objects? We know that it saves Content-Type, which is maybe a point in favor of including timing data there too. But I don't know what it does with other response headers (I assume it defaults to saving them?)

@gruuya @mildbyte What do you think about including the timing data in the Content-Type header along with the schema?

@onpaws
Copy link
Contributor Author

onpaws commented Jun 20, 2023

@gruuya

I think just extending the existing tests should be sufficient to merge this.

Is this what you had in mind? Wasn't sure if you meant literally extending some already existing test, or adding to the existing test suite.

If there's some recommended way to hit both cached_.../uncached_ paths from one single test, that seems nice to avoid what would likely otherwise be code duplication.
I noticed certain macros e.g. #[case::uncached_post("POST", "/q")] that suggest maybe we could use uncached_get and cached_get to test both paths from a single test? Not sure if I'm understanding that correctly though.
image

@gruuya
Copy link
Contributor

gruuya commented Jun 20, 2023

One thing I just learned is that apparently even default CORS safelisted headers have some prerequisites:

For Content-Type: needs to have a MIME type of its parsed value (ignoring parameters) of either application/x-www-form-urlencoded, multipart/form-data, or text/plain.

In our case we have application/json so it seems we won't gain any benefit from shoehorning the timing data into content-type anyway.

@milesrichardson given this, and the fact that @onpaws discovered some handy DataFusion-provided internal metrics (which we may want to supplement along with the request-to-response time as is done currently), I'd vote for going with a separate dedicated header (assuming that it is cached by CF).

@gruuya
Copy link
Contributor

gruuya commented Jun 20, 2023

@onpaws you could also simply extend the last test in that module; it hits both GET and POSTS endpoints. In addition to presence, I'd test that the value of the header is as expected (i.e. something that can be parsed as a float, between 0 and 1).

@mildbyte
Copy link
Contributor

Cloudflare seems to cache a lot of response headers, if not all of them. This screenshot is from one of our pages with the cache disabled in Firefox and with a Cloudflare cache hit, showing X-Cache and X-Varnish headers (that are added by our own Varnish) being returned by Cloudflare in its cached response:

image

@milesrichardson
Copy link

@gruuya

One thing I just learned is that apparently even default CORS safelisted headers have some prerequisites:

Read that a bit more carefully: those restrictions are for the request content-type header and do not apply to the response. (I made the same mistake the first two times I read it in the original issue 😸)

@gruuya
Copy link
Contributor

gruuya commented Jun 20, 2023

Oh damn, good catch :D

- Also add a test helper to testutils.rs
@onpaws
Copy link
Contributor Author

onpaws commented Jun 21, 2023

@onpaws you could also simply extend the last test in that module; it hits both GET and POSTS endpoints. In addition to presence, I'd test that the value of the header is as expected (i.e. something that can be parsed as a float, between 0 and 1).

Ended up doing this approach, cheers.
Now we append it to the existing test hopefully in the way you desired. While I was at it I renamed it to test_http_type_conversion_and_timing_header which is perhaps a bit long but hopefully makes things clear for future us/contributors.

Agreed checking the header is "as expected" is worthwhile and added a second test for that.
I wasn't initially sure the lines of responsibility/best ways to handle weird edge cases when parsing floats from strings. Along the way learned Rust core now uses Eisel-Lemire. Infinity and NaN scenarios were first a concern but IIUC it seems like .is_finite() handles both, which I believe should cover us. If I missed any cases please lmk.

Verified the tests work by temporarily commenting out the header injection code, and re-running, and got the expected result.

before/after

the "before" test run, expected to fail b/c I commented out the header code

running 2 tests
test frontend::http::tests::test_http_type_conversion_and_timing_header::case_2_uncached_post ... FAILED
test frontend::http::tests::test_http_type_conversion_and_timing_header::case_1_cached_get ... FAILED

failures:

---- frontend::http::tests::test_http_type_conversion_and_timing_header::case_2_uncached_post stdout ----
thread 'frontend::http::tests::test_http_type_conversion_and_timing_header::case_2_uncached_post' panicked at 'assertion failed: resp.headers().contains_key(QUERY_TIME_HEADER)', src/frontend/http.rs:1596:9

---- frontend::http::tests::test_http_type_conversion_and_timing_header::case_1_cached_get stdout ----
thread 'frontend::http::tests::test_http_type_conversion_and_timing_header::case_1_cached_get' panicked at 'assertion failed: resp.headers().contains_key(QUERY_TIME_HEADER)', src/frontend/http.rs:1596:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


failures:
    frontend::http::tests::test_http_type_conversion_and_timing_header::case_1_cached_get
    frontend::http::tests::test_http_type_conversion_and_timing_header::case_2_uncached_post

test result: FAILED. 0 passed; 2 failed; 0 ignored; 0 measured; 188 filtered out; finished in 0.04s

the "after" run

running 2 tests
test frontend::http::tests::test_http_type_conversion_and_timing_header::case_2_uncached_post ... ok
test frontend::http::tests::test_http_type_conversion_and_timing_header::case_1_cached_get ... ok

test result: ok. 2 passed; 0 failed; 0 ignored; 0 measured; 188 filtered out; finished in 0.04s

     Running unittests src/main.rs (target/debug/deps/seafowl-49a42c42648f96f5)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running tests/main.rs (target/debug/deps/main-084a7a60e84333ec)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 56 filtered out; finished in 0.00s

Copy link

@Gelio Gelio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments from a fellow Rust enthusiast, feel free to ignore them, since they're mostly about the API of the newly added Timer

src/frontend/profile_utils.rs Outdated Show resolved Hide resolved
src/frontend/profile_utils.rs Outdated Show resolved Hide resolved
- Previously we wrapped with Option, per the warnings
  at https://doc.rust-lang.org/src/std/time.rs.html#86
- After code review perhaps the fact we still call
  .unwrap() obviates the benefit anyway, thus dropped.
@onpaws
Copy link
Contributor Author

onpaws commented Jun 22, 2023

CI passed and it seems we've probably reviewed this PR sufficiently. Any objections to merging?

@milesrichardson
Copy link

I haven't reviewed the code (and don't intend to), but once you merge it, can you update the docs in the marketing repo to reference the header?

@onpaws
Copy link
Contributor Author

onpaws commented Jun 22, 2023

I haven't reviewed the code (and don't intend to), but once you merge it, can you update the docs in the marketing repo to reference the header?

Sure! How do you feel about e.g. this
image

with some basic X-Seafowl-Query-Time example similar to the other ones on this page?

@onpaws
Copy link
Contributor Author

onpaws commented Jun 23, 2023

Thanks Marko! Merging now

@onpaws onpaws merged commit 72ab83f into main Jun 23, 2023
@onpaws onpaws deleted the experimental-timing-data branch June 23, 2023 17:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants