Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

to_sql if_exists argument with SQL server and other schema #7422

Closed
aergener opened this issue Jun 10, 2014 · 20 comments
Closed

to_sql if_exists argument with SQL server and other schema #7422

aergener opened this issue Jun 10, 2014 · 20 comments
Labels
API Design IO SQL to_sql, read_sql, read_sql_query
Milestone

Comments

@aergener
Copy link

The if_exists argument of the to_sql function doesn't check all schema for the table while checking if it exists. Furthermore, it inserts to the default schema, causing somewhat contradictory behavior.

For example, while using SQL Server with my default schema set to test, to_sql inserts the table into test.table_name. However, trying this again, with if_exists='replace', to_sql finds no table of the name dbo.table_name, and then tries to create test.table_name, causing an error.

Details :

http://stackoverflow.com/questions/24126883/pandas-dataframe-to-sql-function-if-exists-parameter-not-working

@jorisvandenbossche
Copy link
Member

@aergener How did you connect to SQL server (how do you set the default schema)?

It is a bit strange that when checking for the table it looks in dbo, but when writing it used the sepcified test schema.

@jorisvandenbossche jorisvandenbossche added this to the 0.14.1 milestone Jun 10, 2014
@aergener
Copy link
Author

@jorisvandenbossche I connected to SQL Server using sqlalchemy.create_engine. The default schema is a user setting within SQL Server itself (not specified while connecting in python).

@jorisvandenbossche
Copy link
Member

Can you check engine.has_table('foobar')? (instead of the pd.io.sql.has_table('foobar', engine) which gave a false False).

@aergener
Copy link
Author

When the table is in the test schema:

>>> engine.has_table('foobar')
Out[9]: False

When creating a table in SQL Server if a schema is not specifically designated, the table will be created in the user's default schema. Perhaps this issue could be fixed if pandas specified a schema for sqlalchemy to use that the user could potentially override?

@jorisvandenbossche
Copy link
Member

I cannot reproduce this with postgresql (set the default schema to something else than the default 'public', but both the checking if the table exists or writing the data uses the same schema I set as default). So I am thinking it is possibly a bug in the sqlalchemy interface to SQL server (just guessing, I don't use SQL server). As engine.has_table('foobar') should look in the default schema I think.

@jorisvandenbossche jorisvandenbossche changed the title to_sql if_exists argument to_sql if_exists argument with SQL server and other schema Jun 11, 2014
@jorisvandenbossche
Copy link
Member

@aergener I posted on the sqlalchemy mailing list: https://groups.google.com/forum/?fromgroups#!topic/sqlalchemy/Ktu06z9x97c Can you try what is said there?

So add echo='debug' to create_engine( ), then then look in the logs (those are printed) when you use the to_sql function. There should somewhere be a query to get the default schema name.

@aergener
Copy link
Author

@jorisvandenbossche Cool, I'll test this out then get back to you.

As an aside, I was wondering if you have thought about adding better datatype support to pandas. Specifically, looking at pandas.io.sql.PandasSQLTable._sqlalchemy_type all strings in pandas end up as text fields in SQL. I'm not sure about other flavors, but in SQL Server working with text fields is a pain, so it would be nice to have something like string_repr option in to_sql. Do you have an email I can contact you at? I would like to help with this, as well as get involved with pandas development in general, but I'm not sure how I should go about it, and you seem like the SQL guy around here...

@aergener
Copy link
Author

@jorisvandenbossche It looks like it's not picking up anything for the default_schema_name and defaulting to dbo. Notice how there's no row after 2014-06-12 20:56:15,461 DEBUG sqlalchemy.engine.base.Engine Col ('default_schema_name',)

2014-06-12 20:56:15,459 INFO sqlalchemy.engine.base.Engine SELECT user_name()
2014-06-12 20:56:15,459 INFO sqlalchemy.engine.base.Engine ()
2014-06-12 20:56:15,460 DEBUG sqlalchemy.engine.base.Engine Col ('',)
2014-06-12 20:56:15,460 DEBUG sqlalchemy.engine.base.Engine Row (u'aergener', )
2014-06-12 20:56:15,460 INFO sqlalchemy.engine.base.Engine 
            SELECT default_schema_name FROM
            sys.database_principals
            WHERE name = ?
            AND type = 'S'

2014-06-12 20:56:15,460 INFO sqlalchemy.engine.base.Engine (u'aergener',)
2014-06-12 20:56:15,461 DEBUG sqlalchemy.engine.base.Engine Col ('default_schema_name',)
2014-06-12 20:56:15,467 INFO sqlalchemy.engine.base.Engine SELECT [TABLES_1].[TABLE_NAME] 
FROM [INFORMATION_SCHEMA].[TABLES] AS [TABLES_1] 
WHERE [TABLES_1].[TABLE_SCHEMA] = CAST(? AS NVARCHAR(max)) AND [TABLES_1].[TABLE_TYPE] = ? ORDER BY [TABLES_1].[TABLE_NAME]
2014-06-12 20:56:15,467 INFO sqlalchemy.engine.base.Engine (u'dbo', u'BASE TABLE')

@jorisvandenbossche
Copy link
Member

@aergener Hmm I don't know SQL Server, so don't know how it works with users and schema's. Maybe you follow up further at the sqlalchemy mailing thread?

@aldanor
Copy link
Contributor

aldanor commented Jun 13, 2014

Sounds related to #7441 -- @jorisvandenbossche, looks like we'll need to be able to pass schema/meta manually to read_sql / to_sql functions after all? :)

@aergener Try this?

engine = create_engine(..., echo='debug')
meta = MetaData()
meta.reflect(bind=engine, schema='test')

@jorisvandenbossche
Copy link
Member

@aldanor It is certainly related, but not necesarily the same issue, as 1) this just works with postgresql (if you are a user with another default schema, it just reads and writes from/to that schema), and 2) engine.has_table() will still look in the wrong place.

I mean, having a schema argument might be a way to circumvent this issue here, but this case should actually work out of the box? (although I don't know SQL Server, so can't say for sure)

UPDATE: my second point can of course be solved with the engine.has_table( .. , schema=..) if we have a schema argument.

@aldanor
Copy link
Contributor

aldanor commented Jun 13, 2014

@jorisvandenbossche Can't say much about this case, don't know much about SQL Server either. In Oracle, I had to set the schema equal to original table owner when calling MetaData#reflect for meta to be able to reflect anything.

Regarding the original question, found a discussion on schemas in sqlalchemy with SQL Server here: https://groups.google.com/forum/#!topic/sqlalchemy/VZQ_SLprNQA -- again, it boils down to passing schema manually when all else fails.

Looks like there's many ways "default schema" could fail, quoting the link above:

Default Schema Name can be empty if the user logs in via an AD Group.
(in fact a Group user cannot be assigned a default schema name before
SQL Server 2012...).

@jorisvandenbossche
Copy link
Member

@aergener If you want to get more involved, very welcome! I am following up a bit the sql issues, but I am by no mean an sql expert (just using postgresql a bit myself). So the experience of other people with other database flavors, more advanced use cases, other database set-ups with users and privileges, etc is very useful.
Some tips to get started:

  • File an issue for the enhancement ideas you have (eg about the better datatype support), so we can start discussing it. Try to give some example case where this would be useful, a possible interface, ..
  • Try to run the sql test suite with SQL Server (for now it is only tested on travis with postgresql, mysql (with pymysql) and sqlite, SQL Server is not available there). But it would be very useful to extend this testing coverage. For this you should make a class similiar to TestPostgreSQLAlchemy in https://github.com/pydata/pandas/blob/master/pandas/io/tests/test_sql.py#L1112. This could be a good starter.
  • Look through the issue with the SQL label and look if you want to try to tackle one (or non-sql issue also welcome of course!) If you hav any questions about it, just ask in the relevant issue.

See here for more general advice on getting started with pandas development: http://pandas.pydata.org/developers.html and https://github.com/pydata/pandas/wiki/Contributing. If you have any questions, just ask here or at the pydata mailing list (https://groups.google.com/forum/#!forum/pydata). Questions you have are probably also relevant to others.

@jorisvandenbossche jorisvandenbossche modified the milestones: 0.15.0, 0.14.1 Jul 3, 2014
@JoergRittinger
Copy link
Contributor

I have also problems with the missing schema information in to_sql when ussing if_exists='append'.
Within sql.PandasSQLAlchemy the functions get_table and has_table are called with the same parameter name starting from the to_sql function.
In the dialect I use the tables in metadata look like this (which will be used in get_table):

>>> self.meta.tables.keys()
[ ..., 'schema_name.table_name', ...]

On the other hand the has_table function of my engine expects table and schema name as two arguments:

def has_table(self, connection, table_name, schema=None):
    ...

So when I call:

pandas_sql.to_sql(
        frame=dataframe,
        name='table_name',
        if_exists='append')

get_table will return None. If I call it with:

pandas_sql.to_sql(
        frame=dataframe,
        name='schema_name.table_name',
        if_exists='append')

get_table will return the table but has_table will return False.

Am I correct that this is kind of the same problem as you discussed?

@maxgrenderjones
Copy link
Contributor

They're related, and down to the fact that has_table and get_table do different things.

has_table does a SQL query to determine if the table exists, using your own engine's way of doing that query. get_table returns the table out of a dictionary of tables that have been loaded by reflection from the database (so it doesn't trigger a SQL query). As a result has_table being True does not imply that you'll get anything back when you call get_table. This therefore results in strange effects if the methods are called in ways that they don't support (see #7826 and #7815).

imho the entry methods (e.g. to_sql) should take a schema name, but then pandas will need to somehow be aware that reflection returns schema_name.table_name under some circumstances but not others (can SQLAlchemy tell us this?). In other words, get_table may need to get much more intelligence than a simple dictionary lookup.

@JoergRittinger
Copy link
Contributor

What do think about replacing the two functions in PandasSQLAlchemy:

def has_table(self, name):
    return self.engine.has_table(name)

def get_table(self, table_name):
    return self.meta.tables.get(table_name)

with:

def has_table(self, table_name, schema_name=None):
    return self.engine.has_table(table_name, schema_name)

def get_table(self, table_name, schema_name=None):
    if schema_name:
        for table in self.metadata.tables.itervalues():
            if table.name == table_name and table.schema == schema_name:
                return table
    else:
        return self.meta.tables.get(table_name)

of course at many other points schema_name would have to be added to function call

@jorisvandenbossche
Copy link
Member

Hey @aergener, an implementation of schema support is here: #7952. Could you try this out to see if this fixes your problem (if you specify the schema)?

@aergener
Copy link
Author

@jorisvandenbossche I will try this out and get back to you

@jorisvandenbossche
Copy link
Member

@aergener Thanks! BTW, see #7957, an issue on the string representation (String/Text). You also mentioned something about that above, maybe you can give your experience theren in the issue.

@jorisvandenbossche
Copy link
Member

This should be solved now #7952 is merged. @aergener if the problem still exists, you can always reopen the issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design IO SQL to_sql, read_sql, read_sql_query
Projects
None yet
Development

No branches or pull requests

6 participants