Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL: bunch of fixes based on the old tests #6987

Merged

Conversation

jorisvandenbossche
Copy link
Member

Some first fixes based on running the old sql tests (from 0.13.1 tag)

  • fixed get_schema (and adapted is signature back to how it was for 0.13.1). For this I added back the old function, as this functionality was not possible with the refactored class based approach (for those you always have to provide a connection object or engine, this was not necessary for get_schema)
  • table_exists seems to be renamed to has_table? I don't know why (and we should also decide if we expose this as a public function), but added it back for now as an alias.

All tests from the old test suite are now running with the new code (for sqlite, not yet tested for mysql), except for the following issues:

  • tquery had a retry and cur arguments (which were at least used in the tests), but not that simple to put back with the new implementation
  • OperationalError is changed to DatabaseError

@jreback I also changed the DeprecationWarnings to FutureWarnings (I am right this is the canonical way to raise deprecation warnings in pandas?) and catched them in the tests. I don't know if this solves #6984 for the sql warnings?

@jreback
Copy link
Contributor

jreback commented Apr 27, 2014

look at the numpy master build ; its going to by default show all DeprecationWarnings

I think that we use FutureWarning (but numpy uses DeprecationWarning) odd

https://travis-ci.org/pydata/pandas/jobs/23890714 (not done yet)

@jreback
Copy link
Contributor

jreback commented Apr 27, 2014

I think it did get rid of 1-2 warnings (1 left for sql, buy you said ok)

@jorisvandenbossche
Copy link
Member Author

@jreback Yes, the one warning that is left is the warning of not using sqlalchemy (user warning, no deprecation warning). If we keep it, we should catch that too.

I think we use FutureWarning because DeprecationWarning is not printed by default for the user? Seems like a good reason to use it.

@jorisvandenbossche
Copy link
Member Author

@danielballan While getting the last tests of the old test suite running (the ones that were using tquery/uquery), I stumbled on the following issue:

Previously, execute had a cur keyword (https://github.com/pydata/pandas/blob/v0.13.1/pandas/io/sql.py#L32), while now this is removed: https://github.com/pydata/pandas/blob/master/pandas/io/sql.py#L90 (this happened in your commit 74d091f) (it says 'deprecated', but it is actually just removed).
This has following consequence: previously you could use execute to do consecutive inserts via the same cursor. This was used in the tests (https://github.com/pydata/pandas/blob/v0.13.1/pandas/io/tests/test_sql.py#L73, actually using tquery, but this boils down to execute), and this now fails as this behaviour is no longer possible (executing on a cursor instead of connection).

Was there a reason to remove it? (of course, if you want to do multiple inserts, you should just write your dataframe, but is was possible before)

@danielballan
Copy link
Contributor

I don't know. Actually, most of those commit messages in 74d091f are not mine. I remember that @mangecoeur did a major rebase on the re-write PR, crediting me (and others) with an arbitrary subset of the commits to give me credit for a share of the effort. There were parallel efforts between @hayd and @jtratner as well, I think. Those commits could be any of ours -- maybe one of them remembers the reasoning.

I notice the commit message, "IMPORTANT - changed legacy to require connection rather than cursor. This is still not yet finalized." It seems this was overlooked. Let me see if it would be hard to change it back without breaking the new stuff....

@jorisvandenbossche
Copy link
Member Author

@danielballan Great if you could look at it.

Can you also have a look at the third commit here (jorisvandenbossche@326c973). For tquery/uquery, I just removed the current implementation and put back the one from 0.13.1, because: a) we deprecate it, so no need to add it to the sqlalchemy mode, and b) the new implementation was not fully backwards compatible (signature changed a bit) for the legacy mode.

@jreback
Copy link
Contributor

jreback commented Apr 30, 2014

FYI, travis just changed to support 3.4 and broke all of our builds...fix coming in a few

@hayd
Copy link
Contributor

hayd commented Apr 30, 2014

has_table is the name of the function from sqlalchemy, I expect this is the reason for the change. We could alias and depreciate table_exists?

It seems strange to delete the cur kwarg, IIRC the old sql code was purely using cursors (if you passes a connection it just unwrapped the cursor) and being able to pass a cursor is part of the DBAPI standard (?). As mentioned all my code was completely rewritten or perhaps squashed, so I'm not sure here... I think should bring it back / depreciate in case it's being used in the wild.

@jreback
Copy link
Contributor

jreback commented Apr 30, 2014

FYI travis is good to go now

- adapt signature of get_schema to what it was before (for backwards compatibility), possibility to call function without connection object
- fix get_schema for sqlalchemy mode
- add tests
As tquery and uquery are deprecated, it is not needed to adapt them to the new sqlalchemy interface + the adapted version changed the signature slightly so it was not fully backwards compatible.
Therefore: put back the old code and removed new implementatation in the classes.
Consequence: the functions are not available for the new sqlalchemy engines, but as they are deprecated this is not needed.
@jorisvandenbossche
Copy link
Member Author

Added the old tests back for now (to ensure they are passing). OK?
Things I had to change:

  • changed MySQLdb to pymysql
  • changed sqlite3.OperationalError to pd.io.sql.DatabaseError (in test_tquery and test_uquery)
  • in test_execute with mymysql converted the row values to python types (with numpy's tolist()), as pymysql errored on the numpy types.
  • test_write_row_by_row are skipped for now (they depend on the issue of execute being able to handle cursor objects as dicussed above)

@jorisvandenbossche
Copy link
Member Author

@danielballan Do you have time to look at the execute cursor issue? For that, the test_write_row_by_row tests should then pass (jorisvandenbossche@7f327ea#diff-052a926a9b2c33a84e27d51c8fad3466R1162), so the SkipTest should be removed then.

@jorisvandenbossche
Copy link
Member Author

@jreback the tests are passing now, except for the ones testing for FutureWarnings (I thought they did pass before, and locally they also pass). Do you have any idea why they are not passing? Because the FutureWarnings are really in there.

I think this is almost ready. @jreback @danielballan @hayd @mangecoeur Somebody able to review? (maybe easier to look at the changes for each commit seperately, as they are not all connected)

@jreback jreback added this to the 0.14.0 milestone May 5, 2014
except Exception as e:
excName = e.__class__.__name__
if excName == 'OperationalError': # pragma: no cover
print('Failed to commit, may need to restart interpreter')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this raise? (and not print)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should not raise I think, as it has to be able to 'retry' a couple of lines lower.

Now I don't fully grasp this code, I just copied it from 0.13.1 version to retain backwards compatibility (Wes put this in 2 years ago), so I thought: no real need to clean this up, as we will remove it anyway after a deprecation cycle.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok....that's fine...just checking, prob right not to touch it!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I can make a warning of it instead of a print statement

@jreback
Copy link
Contributor

jreback commented May 5, 2014

@jorisvandenbossche

I did this to test the deprecations of float indexers; I couldn't get it work otherwise (turn the warnings into exceptions, assert they raise), turn it back off

https://github.com/pydata/pandas/blob/master/pandas/tests/test_indexing.py#L3255

I think their is some general turning off of FutureWarnings somewhere which prevents it from working properly

- removed meta keyword from has_table (was not used)
- added check for if_exists
- added table_exists back as alias for has_table
- added frame_query back as alias for read_sql (and deprecated it)
- changed way to delegate read_sql/table without using has_table for backwards compatibility (because this gave an error when uqing a mysql connection but not specifying the flavor)

Things that I had to change to the tests to get them running:
- changed MySQLdb to pymysql
- changed `sqlite3.OperationalError` to `pd.io.sql.DatabaseError` (in `test_tquery` and `test_uquery`)
- test_write_row_by_row are skipped for now (they depend on the issue of `execute` being able to handle cursor objects).
- in `test_execute` with pymysql converted the row values to python types (with numpy's `tolist()`), as pymysql errored on the numpy types.
@jorisvandenbossche
Copy link
Member Author

@hayd @danielballan any comments? If not I will merge tomorrow to further work on finishing the sql tasks for 0.14.

@danielballan
Copy link
Contributor

I can't find why we changed from MySQLdb to pymysql. Shouldn't matter, I guess, but what's the reasoning?

@danielballan
Copy link
Contributor

@jorisvandenbossche I worked on the execute cursor issue. Haven't tested, and I have to put it down for a moment, but you can get it here if you are eager to review and pull it in so this can merge.

Edit: Make that here. Minor fix.

@jorisvandenbossche
Copy link
Member Author

@danielballan thanks for the fix for execute. I will look at it tomorrow.

About MySQLdb vs pymysql. I don't know exactly, but it was removed from the requirements by @jreback here: 043d165 (see also comment in 4th task point in #6292). Apparantly the tests were failing for some reason. And @mangecoeur changed it from MySQLdb to pymysql in the big refactor here: 2e3f83d (maybe that is the reason the tests were not working anymore, as in requirements it was still MySQLdb).

BTW, it does matter in some way, as one of the tests were failing when I used pymysql (pymysql could not handle a numpy.int64 type, I had to add tolist here: https://github.com/pydata/pandas/pull/6987/files#diff-052a926a9b2c33a84e27d51c8fad3466R1473), and I assume it ran correctly before with MySQLdb.

@danielballan
Copy link
Contributor

Without knowing @mangecoeur's reasons, I'd say it doesn't make sense to move from MySQLdb to pymysql, especially if we're only going to remove it entirely in the next release. We should find out the reasoning before we put in any effort though....

@jreback
Copy link
Contributor

jreback commented May 7, 2014

we could certainly add MySQLdb on a build....easy enough

why don't you try to change one of the builds (it will just compile it inplace), see if it works, then I can build a wheel for it, e.g. change ci/requirements-2.7.txt (and replace pymysql)

@jorisvandenbossche
Copy link
Member Author

@danielballan OK, I picked your fixed, tests are passing now. Thanks!

I am going to merge this.
But for the MySQLdb, also for sqlalchemy we have to test it with mysql, so we are not going to drop tests with mysql. So it can be interesting to test multiple drivers if there are differences there.

jorisvandenbossche added a commit that referenced this pull request May 8, 2014
SQL: bunch of fixes based on the old tests
@jorisvandenbossche jorisvandenbossche merged commit 7fc7ad7 into pandas-dev:master May 8, 2014
@jorisvandenbossche jorisvandenbossche deleted the sql-legacy-tests branch May 8, 2014 14:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO SQL to_sql, read_sql, read_sql_query Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants