Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix caching related issues #4316

Merged
merged 1 commit into from
Feb 7, 2018
Merged

Conversation

mistercrunch
Copy link
Member

@mistercrunch mistercrunch commented Jan 31, 2018

Moving to cache the output of the dataframe as opposed to the output of get_data. Caching the output of get_data doesn't work if the basis for the cache_key is the "query object". When changing controls like the moving average configuration, this doesn't affect the query object (same cache key), but it does affect the post processing that append in get_data. If we use the query object as the basis for the cache key, we need to cache the dataframe itself, and re-execute get_data against the cache.

This may not be 100% ready for merging as:

  • I'm unclear on whether the serialization of the dataframe (using df.to_dict) is 100% clean, there could be bugs where some things get lost through the serde process
  • FilterBox doesn't get cached anymore as it doesn't follow the common pattern of using `get_df

@john-bodley

Also the introduction of a rendered chartState seemed to have broken the cache tag and the row label in the UI.

@mistercrunch
Copy link
Member Author

It seems like we may want to re-think/cleanup the caching logic a bit.

@mistercrunch
Copy link
Member Author

For the record, I'm doing more work on this. Moving to pickling dataframes as other methods of serializations seem incomplete/risky.

@graceguo-supercat
Copy link

@mistercrunch Hi Max, do you mind if i create a another PR just to fix JS part? So this PR focus on cache issue?

@mistercrunch mistercrunch changed the title Fix caching issues [WiP] fix caching issues Feb 2, 2018
@mistercrunch mistercrunch force-pushed the cache_df branch 2 times, most recently from 78815f9 to 8b5757c Compare February 2, 2018 22:21
@mistercrunch
Copy link
Member Author

@john-bodley I think I've got it mostly nailed down now at this point. This turned out to be a much more complex headache then I thought originally. Especially in the area of caching properly multi-queries visualizations.

@@ -93,8 +93,8 @@
"react-sortable-hoc": "^0.6.7",
"react-split-pane": "^0.1.66",
"react-syntax-highlighter": "^5.7.0",
"react-virtualized": "^9.3.0",
"react-virtualized-select": "^2.4.0",
"react-virtualized": "9.3.0",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated but that fixed the build

@john-bodley
Copy link
Member

john-bodley commented Feb 5, 2018

@mistercrunch overall this LGTM. We'll probably have to test it extensively in our staging environment. Note there's a few flake8, pylint issues.

@mistercrunch mistercrunch force-pushed the cache_df branch 3 times, most recently from 75dddcd to af11c76 Compare February 6, 2018 18:05
@mistercrunch mistercrunch changed the title [WiP] fix caching issues Fix caching related issues Feb 6, 2018
self._any_cache_key = None
self._any_cached_dttm = None

self.run_extra_queries()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mistercrunch @john-bodley I'm finding that when a dashboard gets loaded with a filter box viz or a line chart with a time comparison, queries get executed before the dashboard template even gets rendered (resulting in a blank page that is slow to load because it's executing queries). I think this change is causing the issue.

When getting the dashboard data, we get the data for each slice which gets the viz.data and calls run_extra_queries on init and executes the queries. So the dashboard template doesn't get rendered until all of the queries run.

https://github.com/apache/incubator-superset/blob/master/superset/models/core.py#L173

I'm not quite sure how to resolve this because it sounds like this was specifically added here because it needs to run in the init. Any suggestions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wow right. This should not be here. Let me take a shot at it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should do it: #4627 , mind trying it on you side?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the quick fix! I'll test

michellethomas pushed a commit to michellethomas/panoramix that referenced this pull request May 24, 2018
wenchma pushed a commit to wenchma/incubator-superset that referenced this pull request Nov 16, 2018
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.23.0 labels Feb 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.23.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants