-
Notifications
You must be signed in to change notification settings - Fork 272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor cache parameters #225
Conversation
…nternal's The spawn_blocking was introduced in 3ad1827 to make sure the NaiveIntrospection instantiation does not block the local thread.
It needs to be an Arc because it is passed to a tokio::spawn at some point which requires the future to own its values.
We can't spawn a task for each and every query but we can start one task for all of them so the ApolloRouter creation is not slowed down by the cache pre-population.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the fact that we are removing WithCaching. This is a nice simplification.
apollo-router/src/apollo_router.rs
Outdated
.expect("NaiveIntrospection instantiation panicked") | ||
}; | ||
|
||
// Start warming up the cache in a background task |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is smart, but we might lose the benefit of warming the cache up, since it could immediately get some queries that could have been precalculated. If we block on warming up, starting the router is slightly slower, but when reloading configuration there's no delay, we create the router before restarting the server
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 2f34fcb
apollo-router/src/apollo_router.rs
Outdated
for (query, operation, options) in | ||
previous_router.query_planner.get_hot_keys().await | ||
{ | ||
// We can ignore errors, since we are just warming up the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(very small nitpick) we have to ignore errors, because some of the queries that were previously in the cache might not work with the new schema
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I will fix this, this was copy-pasted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 4b16837
this makes the code much simpler, nice :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like where this is going, +1 for me.
A couple of nits / questions here and there.
We might wanna make sure we still get tracing and spans (I'm never going to be at ease until we have tests that cover that, especially when we mark functions sync / async and poke around async_trait).
apollo-router/src/apollo_router.rs
Outdated
// this one is ready. | ||
// | ||
// We might lose the benefit of warming the cache up, since it could immediately get some | ||
// queries that could have been precalculated. If we block on warming up, starting the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand :/
Do we here mean we are trading of update responsiveness for overall throughput?
"Populate the new router's cache before it starts handling requests. We trade off update responsiveness in favor of request processing latency."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is related to this discussion: #225 (comment)
What I understand: there is no point of doing this job in a background task because the server is not stopped while the new router is being created. But @Geal thinks we might lose the benefit of warming up the cache. I didn't think about this too much to be honest. Feel free to update the comment to whatever is more clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know if this is still a requirement before merging. Otherwise I will merge now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't understand the message, but this by no means should block merging
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to explain a bit more: if we first warm up the cache, then switch to the new config, the next queries will benefit from the warmed up cache. While if we switch and warm up in background, the next queries might be blocked until the cache is primed, so there'll be a perf hit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I understand better too now.
Fixed in 37302eb
apollo-router/src/lib.rs
Outdated
@@ -561,7 +561,7 @@ impl FederatedServer { | |||
let state_machine = StateMachine::new( | |||
server_factory, | |||
Some(state_listener), | |||
ApolloRouterFactory::new(self.plan_cache_limit, self.query_cache_limit), | |||
ApolloRouterFactory::new(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
apollo-router/src/router_factory.rs
Outdated
plan_cache_limit, | ||
query_cache_limit, | ||
} | ||
pub fn new() -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks a lot like Default, which is derived above. Do we have something in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are right. I added new() to cover the parameters. Now they have gone, I think we could remove new() and just use default().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in f218f49
|
where do you see this? |
I hear you and I feel the same way. Hopefully we will solve this soon. Jaeger seems to be working alright. BUT I had a harder time to start the server with Jaeger. In the past jaeger was enabled by default but this has been removed somehow. Is this intentional? |
Created #232 to address this |
I think the demo default binds to starstuff, and there is no jaeger receiver, this might be the reason. #232 does the right thing imo, good job! |
https://app.circleci.com/pipelines/github/apollographql/router/594/workflows/61747ae2-f942-4d91-b581-a59d5d1583cc/jobs/3120 was the failing job, I guess I need to dig further. |
Following our discussion, this PR removes the cache parameters from cli arguments and use environment variable instead. In the future when we know what and how we would like to expose these, we can make a more permanent decision. In the meantime, this will simplify the code as we don't have to carry around more things than schema, configuration and router.
Changes
Remove CLI parameters for cache control
Remove trait WithCaching
Use named cached query planner instead of trait:
ApolloRouter
should have good defaults by defaultMove NaiveIntrospection's spawn_blocking to ApolloRouter's internal
Move cache pre-population to ApolloRouter's internal
Use async_trait for RouterFactory (simplifies mocking in test)
Cache pre-population ran in a background task to not block the ApolloRouter's creation
We can't spawn a task for each and every query but we can start one task for all of them so the ApolloRouter creation is not slowed down by the cache pre-population.
Related to #189