Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Druid refresh metadata performance improvements #3527

Merged
merged 5 commits into from
Sep 26, 2017

Conversation

Mogball
Copy link
Contributor

@Mogball Mogball commented Sep 25, 2017

  • Requests to Druid for datasource metadata are now issued in parallel
  • Some SQLA tweaks here and there
  • Added an option to scan Druid only for new tables, instead of refreshing everything

scan_new_druid

@coveralls
Copy link

coveralls commented Sep 25, 2017

Coverage Status

Coverage increased (+0.007%) to 69.559% when pulling 80f0a7a on Mogball:mogball/feature/druid_tables into 3949d39 on apache:master.

@mistercrunch mistercrunch merged commit cf0b670 into apache:master Sep 26, 2017
datasource.merge_flag = merge_flag
session.flush()

# Prepare multithreaded executation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits: these are processes not threads. Q: Do you think this can be offloaded to celery instead of multiprocessing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a bad idea, though for now setting up celery isn't a hard requirement which helps people onboard without having to setup a MQ and workers...

timifasubaa pushed a commit to timifasubaa/incubator-superset that referenced this pull request Oct 3, 2017
* parallelized refresh druid metadata

* fixed code style errors

* fixed code for python3

* added option to only scan for new druid datasources

* Increased code coverage
michellethomas pushed a commit to michellethomas/panoramix that referenced this pull request May 24, 2018
* parallelized refresh druid metadata

* fixed code style errors

* fixed code for python3

* added option to only scan for new druid datasources

* Increased code coverage
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.20.1 labels Feb 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.20.1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants