Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor cache parameters #225

Merged
merged 16 commits into from
Dec 3, 2021
Merged

Refactor cache parameters #225

merged 16 commits into from
Dec 3, 2021

Conversation

cecton
Copy link
Contributor

@cecton cecton commented Dec 1, 2021

Following our discussion, this PR removes the cache parameters from cli arguments and use environment variable instead. In the future when we know what and how we would like to expose these, we can make a more permanent decision. In the meantime, this will simplify the code as we don't have to carry around more things than schema, configuration and router.

Changes

  • Remove CLI parameters for cache control

  • Remove trait WithCaching

  • Use named cached query planner instead of trait: ApolloRouter should have good defaults by default

  • Move NaiveIntrospection's spawn_blocking to ApolloRouter's internal

  • Move cache pre-population to ApolloRouter's internal

  • Use async_trait for RouterFactory (simplifies mocking in test)

  • Cache pre-population ran in a background task to not block the ApolloRouter's creation

    We can't spawn a task for each and every query but we can start one task for all of them so the ApolloRouter creation is not slowed down by the cache pre-population.

Related to #189

…nternal's

The spawn_blocking was introduced in 3ad1827 to make
sure the NaiveIntrospection instantiation does not block the local
thread.
It needs to be an Arc because it is passed to a tokio::spawn at some
point which requires the future to own its values.
We can't spawn a task for each and every query but we can start one task
for all of them so the ApolloRouter creation is not slowed down by the
cache pre-population.
@cecton cecton marked this pull request as ready for review December 1, 2021 11:31
Copy link
Contributor

@garypen garypen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the fact that we are removing WithCaching. This is a nice simplification.

@cecton cecton self-assigned this Dec 1, 2021
.expect("NaiveIntrospection instantiation panicked")
};

// Start warming up the cache in a background task
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is smart, but we might lose the benefit of warming the cache up, since it could immediately get some queries that could have been precalculated. If we block on warming up, starting the router is slightly slower, but when reloading configuration there's no delay, we create the router before restarting the server

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 2f34fcb

for (query, operation, options) in
previous_router.query_planner.get_hot_keys().await
{
// We can ignore errors, since we are just warming up the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(very small nitpick) we have to ignore errors, because some of the queries that were previously in the cache might not work with the new schema

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I will fix this, this was copy-pasted

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 4b16837

@Geal
Copy link
Contributor

Geal commented Dec 1, 2021

this makes the code much simpler, nice :)

Copy link
Contributor

@o0Ignition0o o0Ignition0o left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like where this is going, +1 for me.

A couple of nits / questions here and there.

We might wanna make sure we still get tracing and spans (I'm never going to be at ease until we have tests that cover that, especially when we mark functions sync / async and poke around async_trait).

// this one is ready.
//
// We might lose the benefit of warming the cache up, since it could immediately get some
// queries that could have been precalculated. If we block on warming up, starting the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand :/

Do we here mean we are trading of update responsiveness for overall throughput?

"Populate the new router's cache before it starts handling requests. We trade off update responsiveness in favor of request processing latency."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is related to this discussion: #225 (comment)

What I understand: there is no point of doing this job in a background task because the server is not stopped while the new router is being created. But @Geal thinks we might lose the benefit of warming up the cache. I didn't think about this too much to be honest. Feel free to update the comment to whatever is more clear.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know if this is still a requirement before merging. Otherwise I will merge now

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't understand the message, but this by no means should block merging

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to explain a bit more: if we first warm up the cache, then switch to the new config, the next queries will benefit from the warmed up cache. While if we switch and warm up in background, the next queries might be blocked until the cache is primed, so there'll be a perf hit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I understand better too now.

Fixed in 37302eb

@@ -561,7 +561,7 @@ impl FederatedServer {
let state_machine = StateMachine::new(
server_factory,
Some(state_listener),
ApolloRouterFactory::new(self.plan_cache_limit, self.query_cache_limit),
ApolloRouterFactory::new(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

plan_cache_limit,
query_cache_limit,
}
pub fn new() -> Self {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks a lot like Default, which is derived above. Do we have something in mind?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are right. I added new() to cover the parameters. Now they have gone, I think we could remove new() and just use default().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in f218f49

@o0Ignition0o
Copy link
Contributor

thread 'files::tests::basic_watch' panicked at 'assertion failed: futures::poll!(watch.next()).is_pending()', apollo-router/src/files.rs:79:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
``` let's keep a close eye on this, this shouldnt happen.

@cecton
Copy link
Contributor Author

cecton commented Dec 2, 2021

thread 'files::tests::basic_watch' panicked at 'assertion failed: futures::poll!(watch.next()).is_pending()', apollo-router/src/files.rs:79:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
``` let's keep a close eye on this, this shouldnt happen.

where do you see this?

@cecton
Copy link
Contributor Author

cecton commented Dec 2, 2021

We might wanna make sure we still get tracing and spans (I'm never going to be at ease until we have tests that cover that, especially when we mark functions sync / async and poke around async_trait).

I hear you and I feel the same way. Hopefully we will solve this soon.

image

Jaeger seems to be working alright. BUT I had a harder time to start the server with Jaeger. In the past jaeger was enabled by default but this has been removed somehow. Is this intentional?

@cecton
Copy link
Contributor Author

cecton commented Dec 2, 2021

Jaeger seems to be working alright. BUT I had a harder time to start the server with Jaeger. In the past jaeger was enabled by default but this has been removed somehow. Is this intentional?

Created #232 to address this

@o0Ignition0o
Copy link
Contributor

In the past jaeger was enabled by default but this has been removed somehow. Is this intentional?

I think the demo default binds to starstuff, and there is no jaeger receiver, this might be the reason.

#232 does the right thing imo, good job!

@o0Ignition0o
Copy link
Contributor

thread 'files::tests::basic_watch' panicked at 'assertion failed: futures::poll!(watch.next()).is_pending()', apollo-router/src/files.rs:79:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
``` let's keep a close eye on this, this shouldnt happen.

where do you see this?

https://app.circleci.com/pipelines/github/apollographql/router/594/workflows/61747ae2-f942-4d91-b581-a59d5d1583cc/jobs/3120 was the failing job, I guess I need to dig further.

@cecton cecton merged commit 31eb0d8 into main Dec 3, 2021
@cecton cecton deleted the cecton-refactor-cache-parameters branch December 3, 2021 08:37
tinnou pushed a commit to Netflix-Skunkworks/router that referenced this pull request Oct 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants