You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Related: comment of @balancap: #6416 (comment) and issue #7380 where you get a warning about a not understood type (probably in another table).
Situation now: in the creation of a PandasSQLAlchemy object (https://github.com/pydata/pandas/blob/v0.14.0/pandas/io/sql.py#L777), a full MetaData object is created and reflected (this means: all tables in the database are inspected and the schema's are stored as sqlalchemy Table objects). This is done each time read_sql/read_sql_query/read_sql_table is called.
this can trigger warnings that does not have to do anything with your current query, or the current table you want to read, when eg one of the types in other tables is not known to sqlalchemy (eg a postgis geometry column).
Possible solution:
I think the read_sql functions never should inspect all tables, but only the specified table (and read_sql_query even not that table, as with a query this information is not used, only for read_sql_table)
This can maybe be achieved with using the only keyword in meta.reflect(engine, only=...)
For the OO API interface, we can discuss what should be the default (inspect all tables or not)
Sounds good. As for the OO API, if we take a cue from HDFStore it should not inspect all tables. HDFStore.keys() can run slow for > 10 keys, so it is not run on instantiation.
In the case of HDFStore, the keys can be inspected once and cached, because only one user at a time can open the file for writing. Pandas doesn't cache them, but it's possible. (I'll mention that @nkeim has implemented some code for this; maybe others have too.) But for SQL tables, with multi-user access as a full feature and a common use case, any relevant inspection should be done at query-execution time, not at connection time. Maybe the only keyword can make this fast.
Related: comment of @balancap: #6416 (comment) and issue #7380 where you get a warning about a not understood type (probably in another table).
Situation now: in the creation of a
PandasSQLAlchemy
object (https://github.com/pydata/pandas/blob/v0.14.0/pandas/io/sql.py#L777), a fullMetaData
object is created and reflected (this means: all tables in the database are inspected and the schema's are stored as sqlalchemyTable
objects). This is done each timeread_sql/read_sql_query/read_sql_table
is called.Consequence:
Possible solution:
read_sql
functions never should inspect all tables, but only the specified table (andread_sql_query
even not that table, as with a query this information is not used, only forread_sql_table
)only
keyword inmeta.reflect(engine, only=...)
@mangecoeur @danielballan @hayd
The text was updated successfully, but these errors were encountered: