-
Notifications
You must be signed in to change notification settings - Fork 13.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Druid refresh metadata performance improvements #3527
Druid refresh metadata performance improvements #3527
Conversation
Mogball
commented
Sep 25, 2017
- Requests to Druid for datasource metadata are now issued in parallel
- Some SQLA tweaks here and there
- Added an option to scan Druid only for new tables, instead of refreshing everything
datasource.merge_flag = merge_flag | ||
session.flush() | ||
|
||
# Prepare multithreaded executation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nits: these are processes not threads. Q: Do you think this can be offloaded to celery instead of multiprocessing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a bad idea, though for now setting up celery isn't a hard requirement which helps people onboard without having to setup a MQ and workers...
* parallelized refresh druid metadata * fixed code style errors * fixed code for python3 * added option to only scan for new druid datasources * Increased code coverage
* parallelized refresh druid metadata * fixed code style errors * fixed code for python3 * added option to only scan for new druid datasources * Increased code coverage