Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG/ENH: to_sql fails to detect tables via if_exists on mysql with capitalised table names #7815

Closed
maxgrenderjones opened this issue Jul 21, 2014 · 9 comments · Fixed by #8180
Labels
IO SQL to_sql, read_sql, read_sql_query
Milestone

Comments

@maxgrenderjones
Copy link
Contributor

Demonstration code, running on windows (so case insensitive) with pandas 0.14.0

engine=create_engine('mysql://{username}:{password}@{host}/{database}?charset=utf8'.format(**db))
metadata=MetaData(bind=engine)
table=Table('foo', metadata, Column('id', Integer, primary_key=True), Column('language', Unicode(255)), Column('animal', Unicode(255)))
data=DataFrame.from_dict({1: {'language': 'python', 'animal': 'snake'}, 2: {'language': 'perl', 'animal': 'camel'}}, orient='index')
data.to_sql('Foo', engine, if_exists='replace')
data.to_sql('Foo', engine, if_exists='replace')

Fails with:

OperationalError                          Traceback (most recent call last)
<ipython-input-48-9d53c43e59f6> in <module>()
      1 data.to_sql('Foo', engine, if_exists='replace')
----> 2 data.to_sql('Foo', engine, if_exists='replace')

C:\Anaconda\lib\site-packages\pandas\core\generic.pyc in to_sql(self, name, con, flavor, if_exists, index, index_label)
    948         sql.to_sql(
    949             self, name, con, flavor=flavor, if_exists=if_exists, index=index,
--> 950             index_label=index_label)
    951 
    952     def to_pickle(self, path):

C:\Anaconda\lib\site-packages\pandas\io\sql.pyc in to_sql(frame, name, con, flavor, if_exists, index, index_label)
    438 
    439     pandas_sql.to_sql(frame, name, if_exists=if_exists, index=index,
--> 440                       index_label=index_label)
    441 
    442 

C:\Anaconda\lib\site-packages\pandas\io\sql.pyc in to_sql(self, frame, name, if_exists, index, index_label)
    812         table = PandasSQLTable(
    813             name, self, frame=frame, index=index, if_exists=if_exists,
--> 814             index_label=index_label)
    815         table.insert()
    816 

C:\Anaconda\lib\site-packages\pandas\io\sql.pyc in __init__(self, name, pandas_sql_engine, frame, index, if_exists, prefix, index_label)
    530             else:
    531                 self.table = self._create_table_statement()
--> 532                 self.create()
    533         else:
    534             # no data provided, read-only mode

C:\Anaconda\lib\site-packages\pandas\io\sql.pyc in create(self)
    546 
    547     def create(self):
--> 548         self.table.create()
    549 
    550     def insert_statement(self):

C:\Anaconda\lib\site-packages\sqlalchemy\sql\schema.pyc in create(self, bind, checkfirst)
    646         bind._run_visitor(ddl.SchemaGenerator,
    647                             self,
--> 648                             checkfirst=checkfirst)
    649 
    650     def drop(self, bind=None, checkfirst=False):

C:\Anaconda\lib\site-packages\sqlalchemy\engine\base.pyc in _run_visitor(self, visitorcallable, element, connection, **kwargs)
   1545                                     connection=None, **kwargs):
   1546         with self._optional_conn_ctx_manager(connection) as conn:
-> 1547             conn._run_visitor(visitorcallable, element, **kwargs)
   1548 
   1549     class _trans_ctx(object):

C:\Anaconda\lib\site-packages\sqlalchemy\engine\base.pyc in _run_visitor(self, visitorcallable, element, **kwargs)
   1192     def _run_visitor(self, visitorcallable, element, **kwargs):
   1193         visitorcallable(self.dialect, self,
-> 1194                             **kwargs).traverse_single(element)
   1195 
   1196 

C:\Anaconda\lib\site-packages\sqlalchemy\sql\visitors.pyc in traverse_single(self, obj, **kw)
    117             meth = getattr(v, "visit_%s" % obj.__visit_name__, None)
    118             if meth:
--> 119                 return meth(obj, **kw)
    120 
    121     def iterate(self, obj):

C:\Anaconda\lib\site-packages\sqlalchemy\sql\ddl.pyc in visit_table(self, table, create_ok)
    726                 self.traverse_single(column.default)
    727 
--> 728         self.connection.execute(CreateTable(table))
    729 
    730         if hasattr(table, 'indexes'):

C:\Anaconda\lib\site-packages\sqlalchemy\engine\base.pyc in execute(self, object, *multiparams, **params)
    718                                 type(object))
    719         else:
--> 720             return meth(self, multiparams, params)
    721 
    722     def _execute_function(self, func, multiparams, params):

C:\Anaconda\lib\site-packages\sqlalchemy\sql\ddl.pyc in _execute_on_connection(self, connection, multiparams, params)
     65 
     66     def _execute_on_connection(self, connection, multiparams, params):
---> 67         return connection._execute_ddl(self, multiparams, params)
     68 
     69     def execute(self, bind=None, target=None):

C:\Anaconda\lib\site-packages\sqlalchemy\engine\base.pyc in _execute_ddl(self, ddl, multiparams, params)
    772             compiled,
    773             None,
--> 774             compiled
    775         )
    776         if self._has_events or self.engine._has_events:

C:\Anaconda\lib\site-packages\sqlalchemy\engine\base.pyc in _execute_context(self, dialect, constructor, statement, parameters, *args)
    945                                 parameters,
    946                                 cursor,
--> 947                                 context)
    948 
    949         if self._has_events or self.engine._has_events:

C:\Anaconda\lib\site-packages\sqlalchemy\engine\base.pyc in _handle_dbapi_exception(self, e, statement, parameters, cursor, context)
   1106                                         self.dialect.dbapi.Error,
   1107                                         connection_invalidated=self._is_disconnect),
-> 1108                                     exc_info
   1109                                 )
   1110 

C:\Anaconda\lib\site-packages\sqlalchemy\util\compat.pyc in raise_from_cause(exception, exc_info)
    183             exc_info = sys.exc_info()
    184         exc_type, exc_value, exc_tb = exc_info
--> 185         reraise(type(exception), exception, tb=exc_tb)
    186 
    187 if py3k:

C:\Anaconda\lib\site-packages\sqlalchemy\engine\base.pyc in _execute_context(self, dialect, constructor, statement, parameters, *args)
    938                                      statement,
    939                                      parameters,
--> 940                                      context)
    941         except Exception as e:
    942             self._handle_dbapi_exception(

C:\Anaconda\lib\site-packages\sqlalchemy\engine\default.pyc in do_execute(self, cursor, statement, parameters, context)
    433 
    434     def do_execute(self, cursor, statement, parameters, context=None):
--> 435         cursor.execute(statement, parameters)
    436 
    437     def do_execute_no_params(self, cursor, statement, context=None):

C:\Anaconda\lib\site-packages\MySQLdb\cursors.pyc in execute(self, query, args)
    203             del tb
    204             self.messages.append((exc, value))
--> 205             self.errorhandler(self, exc, value)
    206         self._executed = query
    207         if not self._defer_warnings: self._warning_check()

C:\Anaconda\lib\site-packages\MySQLdb\connections.pyc in defaulterrorhandler(***failed resolving arguments***)
     34     del cursor
     35     del connection
---> 36     raise errorclass, errorvalue
     37 
     38 re_numeric_part = re.compile(r"^(\d+)")

OperationalError: (OperationalError) (1050, "Table 'foo' already exists") '\nCREATE TABLE `Foo` (\n\t`index` INTEGER, \n\tanimal TEXT, \n\tlanguage TEXT\n)\n\n' ()

I haven't poked around the code to see if this is a SQLAlchemy issue or a pandas issue, but I'd hope it shouldn't be too difficult to fix

@maxgrenderjones
Copy link
Contributor Author

So this specific error has been fixed in 0.14.1 as

    def has_table(self, name):
        if self.meta.tables.get(name) is not None:
            return True
        else:
            return False

was changed in 0.14.1 to

def has_table(self, name):
        return self.engine.has_table(name)

However in general there seems to be an assumption that you can get a table, so it now fails at:

C:\Anaconda\lib\site-packages\pandas\io\sql.pyc in drop_table(self, table_name)
    854     def drop_table(self, table_name):
    855         if self.engine.has_table(table_name):
--> 856             self.meta.reflect(only=[table_name])
    857             self.get_table(table_name).drop()
    858             self.meta.clear()

because it can't find the table.

From the SQLAlchemy docs,

Case Sensitivity and Table Reflection

MySQL has inconsistent support for case-sensitive identifier names, basing support on specific details of the underlying operating system. However, it has been observed that no matter what case sensitivity behavior is present, the names of tables in foreign key declarations are always received from the database as all-lower case, making it impossible to accurately reflect a schema where inter-related tables use mixed-case identifier names.

Therefore it is strongly advised that table names be declared as all lower case both within SQLAlchemy as well as on the MySQL database itself, especially if database reflection features are to be used.

Rather than mark this CANTFIX WONTFIX, could we fetch the correct table name by changing

    def get_table(self, table_name):
        return self.meta.tables.get(table_name)

to search through metadata.tables.keys() with case insensitive matching on mysql if self.meta.tables.get(table_name) returns None?

To answer my own question, from the mysql docs it seems like you'd have to know the value of lower_case_table_names before knowing whether this was a safe thing to do:

If set to 0, table names are stored as specified and comparisons are case sensitive. If set to 1, table names are stored in lowercase on disk and comparisons are not case sensitive. If set to 2, table names are stored as given but compared in lowercase. This option also applies to database names and table aliases. For additional information, see Section 9.2.2, “Identifier Case Sensitivity”.

You should not set this variable to 0 if you are running MySQL on a system that has case-insensitive file names (such as Windows or Mac OS X). If you set this variable to 0 on such a system and access MyISAM tablenames using different lettercases, index corruption may result. On Windows the default value is 1. On Mac OS X, the default value is 2.

Seems fairly involved! Seems like it would be easier to write 'DROP TABLE %s' :)

Thoughts on what direction to take this?

@maxgrenderjones
Copy link
Contributor Author

How about this. If the table exists and the engine is mysql but self.meta.tables.get(name) is None, then we can be fairly sure it's a case sensitivity problem. At this stage we can just iterate through meta.tables.keys() to find the right version of name to pull out of the table without needing to fuss about lower_case_table_names. (Does anyone know/feel we can assume based on the above that this always be name.lower()?)

@maxgrenderjones maxgrenderjones changed the title to_sql if_exists fail on mysql with capitalised table names BUG/ENH: to_sql fails to detect tables via if_exists on mysql with capitalised table names Jul 22, 2014
@maxgrenderjones
Copy link
Contributor Author

While I think about it, here's my proposed code

def drop_table(self, table_name):
    if self.engine.has_table(table_name):
        if self.engine.dialect=='mysql' and self.get_table(table_name) is None and self.get_table(table_name.lower()) is not None:
            table_name=table_name.lower()
        self.meta.reflect(only=[table_name])
        self.get_table(table_name).drop()
        self.meta.clear()

However, that just fixes it in drop_table. A similar issue will arise with read_sql if I change it as proposed in #7826 to use get_table to detect existence of a table (which doesn't run a query) vs has_table which does, and there may well be other places. Think we need to take a call on whether this is a feature or a bug?

@jorisvandenbossche
Copy link
Member

Hmm, what a mess this case sensitivity .. :-) But thanks for reporting and digging into this, @maxgrenderjones

Your fix looks reasonable (in the sense it looks like solving the problem), but on the other hand, if we have to do this for every where for some flavor something is not working, it can become tedious (that was the reason for using sqlalchemy).
Would it also be possible that this is a bug in sqlalchemy reflect only keyword? As this should handle this case problem? (as has_table does handle it correctly, you could assume the same from reflect?)

@maxgrenderjones
Copy link
Contributor Author

Agree, it's a bit of a nightmare, so I'm just hoping that the issue is that we're not using SQLAlchemy correctly, rather than that even after SQLAlchemy-ing our code, we still have loads of edge cases.

You could probably call it a bug that reflect's only keyword is case sensitive even when mysql is being case insensitive e.g.

>>> model.Engine
Engine(mysql+oursql://username:***@hostname/database)
>>> meta=MetaData(bind=model.Engine)
>>> meta.reflect(only=['OnlineTransactions'])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Anaconda\lib\site-packages\sqlalchemy\sql\schema.py", line 3246, in reflect
    (bind.engine.url, s, ', '.join(missing)))
sqlalchemy.exc.InvalidRequestError: Could not reflect: requested table(s) not available in mysql+oursql://username:***@hostname/database: (OnlineTransactions)
>>> meta.reflect(only=['onlinetransactions'])
>>>

but I can't believe that the difference between engine.has_table (used in our has_table) and meta.tables (used in our get_table) isn't going to need solving, as it's exactly what's at stake with #7422, and probably also #7441). It may be that we need a series of (potentially database specific) heuristics to ensure that if has_table returns True, then get_table returns something. Furthermore, it's not like if you get rid of the only keyword the problem goes away:

>>> meta.reflect()
>>> 'onlinetransactions' in meta.tables
True
>>> 'OnlineTransactions' in meta.tables
False

(There's a partial solution on #7422, but if we're going to fix some of it, would be good to fix the whole thing while we're at it)

@jorisvandenbossche
Copy link
Member

Yes, the difference between has_table and get_table will need to get solved when implementing schema/meta arguments (#7441/#7422). Are you interested to look into this?

@jorisvandenbossche
Copy link
Member

@maxgrenderjones I started implementing schema support, see #7952. I don't think this already solves this issue, but if you could try it out, that would certainly be helpful.

@jorisvandenbossche jorisvandenbossche added this to the 0.15.0 milestone Aug 7, 2014
@jorisvandenbossche
Copy link
Member

@maxgrenderjones Coming back to this. I don't feel like putting in the fix you proposed above, as this seems a bit too specifi. Furthermore, eg on linux you can have a difference between 'Foo' and 'foo' (it depends on your mysql settings, and the default values differ between operating system). And I also don't want to do more on this issue as SQLAlchemy does, relying on them that they do the best as possible wihtout getting lost in this rabbit hole.

Further, for this specific case, with MySQL on Windows, all tables names are converted to lower case, so you really shouldn't use upper case.

But, I was thinking, what we can do, is trying to detect such a case like this and raise a warning if possible. Eg in the case that the user provided table name (here 'Foo') does not match the actually created table name (in this case 'foo'). If such a case is detected, we could raise a warning like "The table name 'Foo' you provided is not used exactly in the database, possibly due to case sensitivity issues. Consider using lower case table names"

See also discussion in sqlalchemy issue tracker: https://bitbucket.org/zzzeek/sqlalchemy/issue/3181/metadatareflect-with-only-keyword-fails

@maxgrenderjones
Copy link
Contributor Author

Thanks @jorisvandenbossche, seems a fair resolution of a problem with no clean answers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO SQL to_sql, read_sql, read_sql_query
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants