Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(sqllab): Inefficient SqlaTable query (#24343) #24344

Merged
merged 1 commit into from
Jun 12, 2023

Conversation

giftig
Copy link
Contributor

@giftig giftig commented Jun 9, 2023

Use a combination of lazyload and load_only to ensure sqlalchemy doesn't eagerly join to additional tables, which was happening in a very inefficient way for a large number of tables and resulting in this endpoint never returning (in addition to high db load)

SUMMARY

Makes the lookup for extra for tables listed by schema much more efficient, especially when there is a large number of tables present.

TESTING INSTRUCTIONS

  1. Populate a few thousand datasources for tables in a schema you want to list
  2. Go to SQL Lab and try to get the table list for your selected schema
  3. Observe the call times out as the API responds in a timely manner

More info on the associated issue: #24343

ADDITIONAL INFORMATION

@codecov
Copy link

codecov bot commented Jun 9, 2023

Codecov Report

Merging #24344 (7220a47) into master (0e3f1f6) will increase coverage by 0.00%.
The diff coverage is 100.00%.

❗ Current head 7220a47 differs from pull request most recent head bb5d308. Consider uploading reports for the commit bb5d308 to get more accurate results

@@           Coverage Diff           @@
##           master   #24344   +/-   ##
=======================================
  Coverage   69.09%   69.09%           
=======================================
  Files        1903     1903           
  Lines       74608    74610    +2     
  Branches     8107     8107           
=======================================
+ Hits        51550    51552    +2     
  Misses      20947    20947           
  Partials     2111     2111           
Flag Coverage Δ
hive 53.79% <100.00%> (+<0.01%) ⬆️
mysql 79.39% <100.00%> (+<0.01%) ⬆️
postgres 79.47% <100.00%> (+<0.01%) ⬆️
presto 53.71% <100.00%> (+<0.01%) ⬆️
python 83.38% <100.00%> (+<0.01%) ⬆️
sqlite 77.99% <100.00%> (+<0.01%) ⬆️
unit 54.20% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
superset/databases/commands/tables.py 100.00% <100.00%> (ø)
superset/views/core.py 76.26% <100.00%> (+0.01%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@john-bodley
Copy link
Member

john-bodley commented Jun 11, 2023

Thanks @giftig for the change. It seems this endpoint is deprecated (and slated for deletion in #24342) and thus I was wondering whether you could make said change to the replacement API endpoint?

Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, big thanks for this perf improvement!

@villebro
Copy link
Member

villebro commented Jun 12, 2023

Thanks @giftig for the change. It seems this endpoint is deprecated (and slated for deletion in #24342) and thus I was wondering whether you could make said change to the replacement API endpoint?

@john-bodley as this is fixing a serious perf regression affecting master branch right now, I suggest we defer the API endpoint move to a follow-up PR.

Edit: Never mind, I read this too hastily; @giftig , can you apply the same fix to this code to ensure this doesn't break again when #24342 is merged?

extra_dict_by_name = {
table.name: table.extra_dict
for table in (
db.session.query(SqlaTable).filter(
SqlaTable.database_id == self._model.id,
SqlaTable.schema == self._schema_name,
)
).all()
}

Use a combination of lazyload and load_only to ensure sqlalchemy doesn't
eagerly join to additional tables, which was happening in a very
inefficient way for a large number of tables and resulting in this
endpoint never returning (in addition to high db load)
@villebro villebro merged commit 6d9df43 into apache:master Jun 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants