-
Notifications
You must be signed in to change notification settings - Fork 14.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fetch charts with GET to benefit from browser cache and conditional requests #7032
Conversation
* Exclude venv for python linter to ignore * Fix NaN error
This PR sets the background-color css property on `.ace_scroller` instead of `.ace_content` to prevent the white background shown during resizing of the SQL editor before drag ends.
you sure GET request can handle why not use Redis cache query results, in aribnb the round trip to fetch data from Redis is only 600ms~800ms. i think we should use etag for dashboard metadata (like dashboard layout, huge json blob...) |
Thanks, I'm aware of that problem. Here the
The decorator is using Redis for the server-side caching, but it's caching the HTTP response instead of the dataframe (saving time that's spent in serialization). But I'm also using the native browser cache (through the
Everywhere! :) |
|
Ah, you're right. The filter box works when changed, since it does a
Yes, but it has only the |
@graceguo-supercat I tested the interaction with filter boxes and it's not working. I'll work on fixing it. |
@betodealmeida this seems pretty complicated on top of data requests that are already complicated My first question to gauge whether it's worthwhile is: what are the speedup times you are seeing for existing approach vs your new approach? (ideally for multiple dashboards of varying size) |
@williaster the speedup will greatly depend on how often the data changes, how big the payload is (bignum vs deck.gl varies significantly), the duration of the cache, and how slow the network is. I can (and have) run tests against the example dashboards, but I don't think they would be significative, since they don't cover all the real life use cases. I think the question we should ask here is: "given that this is clearly an improvement, how can we make it bug free?" |
@betodealmeida just to clarify step 4 (and per your diagram) you have:
yet in the code it still seems like we're caching the result set from the database and thus I wonder if the diagram and thus After phase should mention that there's an additional entry in the server cache, i.e., after step (4) the server-side cache would contain:
Note I'm not saying this is wrong as I strongly believe that the database response should be cache given that represents the bulk of the compute, I just wanted to get clarity on the logic. |
@betodealmeida this is a big change that has the potential to impact many many users. If you're unable to provide numbers that indicate that this is strictly an improvement (or at a minimum no regressions), I'm a bit reluctant to introduce this additional complexity. I think it's part of the expected work of a feature like this to demonstrate the effects with real-life examples. It seems like you should be able to use real Lyft dashboards, dashboards from the "example datasets", etc., and you can throttle network speed using dev tools. Another concern I have is the impact on the # of requests. We've needed to introduce domain sharding because of the large number of simultaneous requests made by larger dashboards, and this potential DOUBLES that number, so I would want to see perf numbers that demonstrate no regressions for that case as well. |
Not directly related, but on the topic of optimization around caching, it would be an extra win to make the caching call |
@mistercrunch |
@williaster I'm thinking about something else: on the server side, in python, making the |
Sure, I will give the numbers of the example dashboards and some of the Lyft dashboards. My point was that, considering that this is a strict improvement, defining a threshold to accept the changes seems arbitrary to me. Of course I agree that if this causes a regression or no significant improvement we should not do it. Maybe it's not clear, but with this PR the client will always issue a smaller number of requests, and a percentage of those requests will receive body-less responses. Combined with the fact that we move the server cache closer to the user, there shouldn't be any regressions with this PR, unless I'm doing something stupid (which I've done in the past). Also, keep in mind that this PR is against the
Sorry, why do you think this would double the number of requests? The number of requests should be strictly equal or smaller: resources within the lifetime of the cache are no longer requested because the browser reads directly from its cache, and conditional requests are still a single request. The server will either return a normal response ( |
@john-bodley not sure if I understand your question. Currently we cache the dataframe in
Eventually we can remove the dataframe caching, but I'd rather do that in a separate PR. |
@graceguo-supercat I changed the code so that in the
This way the |
* added more functionalities for query context and object. * fixed cache logic * added default value for groupby * updated comments and removed print (cherry picked from commit d5b9795)
(cherry picked from commit 6a4d507)
…esh (apache#7027) (cherry picked from commit cc58f0e)
@betodealmeida my point was that I felt that the picture and description doesn't accurately reflect what your code is actually doing, i.e., that there's actually two server-side cache keys associated with each chart. They're both using the same underlying cache but the diagram and steps don't accurately reflect this. I agree with your logic but thought maybe the steps should be more explicit, e.g., After
|
@john-bodley you're right. I ended up describing in "after" the workflow once we remove the dataframe cache. |
thanks for the benchmarks! will let @john-bodley sign off since he had the last requested change. |
I was working on this tricky bug and ended up here as the cause for it. The issue is around the fact that the The bug or symptom is the "Genders by State" example started showing an extra (3rd) metric ( There's a lot going on related to this, but a key point here is that we have control-related logic like
Now another thing that compounds here is that
Now it's pretty clear that addressing any of these 3 things would fix my symptom, but point 2 on its own is worrisome. It can lead to intricate issues over time. Say I save a chart, and after that I add a new control to that vizType, in the context of the Good news is I'm working on a refactor #7350 to help and clean up all of the control / formData processing logic. It grew out of control over time. Now the assumption is that all request would make it through this logic, and I'm realizing that's not the case, at least since this PR. Ideas? |
@wraps(f) | ||
def wrapper(*args, **kwargs): | ||
# check if the user can access the resource | ||
check_perms(*args, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was digging around to try and figure out where the datasource access permission is done nowadays, and found it here in the etag_cache decorator. I feel like it's not the right place for it.
I understand that this needs to happen prior to reading from cache, but maybe it should be done as a prior decorator, or maybe both of these routines should be done inside a method instead of decorators, to avoid calling get_viz twice.
form_data, slc = get_form_data(slice_id, use_slice_data=True) | ||
datasource_type = slc.datasource.type | ||
datasource_id = slc.datasource.id | ||
viz_obj = get_viz( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think get_viz
gets called at least two times now (here and in the view itself)
Found some other issues here that I wanted to raise ^^^ Also I noticed that the big "merge" on In the future, |
@mistercrunch agreed, and after that |
This is a small PR that does a lot. It changes the initial request for charts (in explore or dashboards) to be done through a
GET
request, greatly improving the loading speed of dashboards. It also moves the caching to the HTTP layer, allowing us to benefit fromExpires
andETag
headers for conditional requests.The problem
This diagram compares the current flow ("before") with the one implemented by this PR ("after"):
Before
Let's assume Superset is configured with a 1 hour cache, and also that the data changes on a longer period (daily, eg):
POST
request with the payload.There are a few inefficiencies here:
POST
requests.After
GET
request with the chart id.Expires
header of 1 hour, and anETag
header which is a hash of the payload.SupersetClient
caches it also in the Cache interface.Expires
header and the use ofGET
the data is read directly from the native browser cache.Expires
is now in the past.SupersetClient
looks for a cached response in the Cache interface, and if one is found, extracts itsETag
.If-None-Match
header, containing the hash of the cached response (itsETag
).ETag
matches theIf-None-Match
header, returning a304 Not Modified
response.Notes
The
GET
request is done only the first time the chart is mounted. Forcing refresh on dashboards and clicking "Run Query" in the Explore views performPOST
requests, which bypass the cache, and cache the new response. I tested the Explore view and dashboards with filters, and all further interactions are done withPOST
s.Since we're caching the HTTP response, we need to verify that the user has permission to read the cached response. This is done by passing a
check_perms
function to the decorator that caches the responses.The fetch API has no support for conditional responses with ETags. We need to add explicit support in
SupersetClient
. I have a separate PR for that (see feat: add support for conditional requests apache-superset/superset-ui#119).There is one small downside to this approach. During the time while
Expires
is still valid, the browser will not perform any requests for cached charts unless the user explicitly refreshes a dashboard or click "Run Query" in the Explore view. If the data is bad, they will see bad data until it expires or they purposefully refresh the chart. In the current workflow, in theory we can purge the cache in this case, since it lives only on the server-side. This is a hypothetical scenario, and we could workaround it by sending a notification to dashboards that one or more charts have bad data and should be refreshed.