Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SqlaTable query hangs on list tables API call with large number of tables #24343

Closed
3 tasks done
giftig opened this issue Jun 9, 2023 · 0 comments
Closed
3 tasks done
Labels
#bug Bug report

Comments

@giftig
Copy link
Contributor

giftig commented Jun 9, 2023

An inefficient SqlaTable query causes the table list endpoint to hang when you have a large number of tables in the schema.

How to reproduce the bug

  1. Populate a few thousand datasources for tables in a schema you want to list
  2. Go to SQL Lab and try to get the table list for your selected schema
  3. Observe the call times out as the API does not respond in a timely manner

Expected results

Efficient return of table list from this endpoint

Actual results

API call hangs

Environment

(please complete the following information):

  • browser type and version: N/A
  • superset version: 2.1.0

Checklist

Make sure to follow these steps before submitting your issue - thank you!

  • I have checked the superset logs for python stacktraces and included it here as text if there are any.
  • I have reproduced the issue with at least the latest released version of superset.
  • I have checked the issue tracker for the same issue and I haven't found one similar.

Additional context

I've gotten the bottom of this issue and will raise a PR. Essentially the call to SqlaTable to retrieve associated table info and populate the extra field is extremely inefficient with a large number of queries, and that's because of the eager loading of the columns and metrics relationships: the direct query being called is relatively fine even with thousands of rows, but when I inspected with echo on, sqlalchemy was performing an additional ~41 queries, including several outer joins with these additional tables and several queries like WHERE table_id IN (...thousands of IDs).

This can be avoided by using the load_only and lazyload options on this query.

@giftig giftig added the #bug Bug report label Jun 9, 2023
giftig added a commit to giftig/superset that referenced this issue Jun 12, 2023
Use a combination of lazyload and load_only to ensure sqlalchemy doesn't
eagerly join to additional tables, which was happening in a very
inefficient way for a large number of tables and resulting in this
endpoint never returning (in addition to high db load)
@giftig giftig closed this as completed Jun 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
#bug Bug report
Projects
None yet
Development

No branches or pull requests

1 participant