Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH #6416 improve performance on SQL insert #6417

Closed
wants to merge 19 commits into from
Closed

ENH #6416 improve performance on SQL insert #6417

wants to merge 19 commits into from

Conversation

mangecoeur
Copy link
Contributor

This empirically significantly improves write performance by allowing SQLAlchemy to use the executemany method of the driver if available.

danielballan and others added 19 commits January 23, 2014 15:47
TST Import sqlalchemy on Travis.

DOC add docstrings to read sql

ENH read_sql connects via Connection, Engine, file path, or :memory: string

CLN Separate legacy code into new file, and fallback so that all old tests pass.

TST to use sqlachemy syntax in tests

CLN sql into classes, legacy passes

FIX few engine vs con calls

CLN pep8 cleanup

add postgres support for pandas.io.sql.get_schema

WIP: cleaup of sql io module - imported correct SQLALCHEMY type, delete redundant PandasSQLWithCon

TODO: renamed _engine_read_table, need to think of a better name.
TODO: clean up get_conneciton function

ENH: cleanup of SQL io

TODO: check that legacy mode works
TODO: run tests

correctly enabled coerce_float option

Cleanup and bug-fixing mainly on legacy mode sql.
IMPORTANT - changed legacy to require connection rather than cursor. This is still not yet finalized.
TODO: tests and doc

Added Test coverage for basic functionality using in-memory SQLite database

Simplified API by automatically distinguishing between engine and connection. Added warnings
Initial draft of doc updates

minor doc updates

Added tests and reduced code repetition. Updated Docs. Added test coverage for legacy names

Documentation updates, more tests

Added depreciation warnings for legacy names.

Updated docs and test doc build

ENH #4163 - finalized tests and docs, ready for wider use…

TST added sqlalchemy to TravisCI build dep for py 2.7 and 3.3

TST Import sqlalchemy on Travis.

DOC add docstrings to read sql

ENH read_sql connects via Connection, Engine, file path, or :memory: string

CLN Separate legacy code into new file, and fallback so that all old tests pass.

ENH #4163 added version added coment

ENH #4163 added depreciation warning for tquery and uquery

ENH #4163 Documentation and tests
…e date options. Updated optional dependancies

Added columns optional arg to read_table, removed failing legacy tests.

Added columns to doc

ENH #4163 Fixed class renaming, expanded docs

ENH #4163 Fixed tests in legacy mode
TST Import sqlalchemy on Travis.

DOC add docstrings to read sql

ENH read_sql connects via Connection, Engine, file path, or :memory: string

CLN Separate legacy code into new file, and fallback so that all old tests pass.

TST to use sqlachemy syntax in tests

CLN sql into classes, legacy passes

FIX few engine vs con calls

CLN pep8 cleanup

add postgres support for pandas.io.sql.get_schema

WIP: cleaup of sql io module - imported correct SQLALCHEMY type, delete redundant PandasSQLWithCon

TODO: renamed _engine_read_table, need to think of a better name.
TODO: clean up get_conneciton function

ENH: cleanup of SQL io

TODO: check that legacy mode works
TODO: run tests

correctly enabled coerce_float option

Cleanup and bug-fixing mainly on legacy mode sql.
IMPORTANT - changed legacy to require connection rather than cursor. This is still not yet finalized.
TODO: tests and doc

Added Test coverage for basic functionality using in-memory SQLite database

Simplified API by automatically distinguishing between engine and connection. Added warnings
Initial draft of doc updates

minor doc updates

Added tests and reduced code repetition. Updated Docs. Added test coverage for legacy names

Documentation updates, more tests

Added depreciation warnings for legacy names.

Updated docs and test doc build

ENH #4163 - finalized tests and docs, ready for wider use…

TST added sqlalchemy to TravisCI build dep for py 2.7 and 3.3

TST Import sqlalchemy on Travis.

DOC add docstrings to read sql

ENH read_sql connects via Connection, Engine, file path, or :memory: string

CLN Separate legacy code into new file, and fallback so that all old tests pass.

ENH #4163 added version added coment

ENH #4163 added depreciation warning for tquery and uquery

ENH #4163 Documentation and tests
…e date options. Updated optional dependancies

Added columns optional arg to read_table, removed failing legacy tests.

Added columns to doc

ENH #4163 Fixed class renaming, expanded docs

ENH #4163 Fixed tests in legacy mode
Conflicts:
	ci/requirements-2.6.txt
	doc/source/io.rst
	pandas/io/sql.py
	pandas/io/tests/test_sql.py
TEST: add postgresql to travis
One base class for tests with sqlalchemy backend. So test classes
for mysql and postgresql don't have to overwrite tests that are
different for sqlite.
Conflicts:
	doc/source/io.rst
	pandas/io/sql.py
	pandas/io/tests/test_sql.py
@jorisvandenbossche
Copy link
Member

@mangecoeur You have again a problem with the commits (messed up history) .. :-(

A first advice: don't work on master, but create a feature branch for every PR, eg in this case:

git checkout -b sql-perf

Secondly (but you should do this first, otherwise the history of the branch will also be messed up), to clean-up your history, maybe you can do the following (not fully sure this will work, otherwise @jreback can maybe give some advice):

  • save your changes to somewhere else, for example another branch: git checkout -b sql-temp
  • go back to master: git checkout master
  • remove all your latest commits you see here in the PR, eg git reset --hard HEAD~20 for going 20 commits back
  • fetch latest upstream: git fetch upstream
  • rebase your master to upstream: git rebase upstream/master
  • Now create a feature branch: git checkout -b sql-perf
  • Now you can try to cherry pick you latest changes (saved in the branch sql-temp) to your new cleaned feature branch: git cherry-pick <commit> (with being the hash of the particular commit you want to restore).
  • then you can push this branch to github: git push origin sql-perf and create a PR with that.

@mangecoeur
Copy link
Contributor Author

@jorisvandenbossche yeah i just realized my history is a mess. Am in process of dumping my repo and starting fresh from upstream master.

@jorisvandenbossche
Copy link
Member

If you follow my steps above, normally you should be able to easily recover your new commits and cleaning up the history without dumping your whole repo.

But in every case, when you have a fresh repo, don't work on master

@mangecoeur
Copy link
Contributor Author

I know i shouldn't, was just in too much of a rush to git properly :P anyway, ended up nuking my entire fork and starting again, life is too short for merge conflicts! new PR: #6420

@jorisvandenbossche
Copy link
Member

Looks much better!

There seem to be some more commits in the above list that are not yet in pandas itself?

Closing this PR then.

@mangecoeur
Copy link
Contributor Author

I think a lot of those commits got squashed when the main SQLalchemy work get merged. That merge history got mangled anyway since it was a bit of a mess, a lot of conflicts that i think jreback took care of.

On 20 Feb 2014, at 14:29, Joris Van den Bossche notifications@github.com wrote:

Looks much better!

There seem to be some more commits in the above list that are not yet in pandas itself?


Reply to this email directly or view it on GitHub.

@jorisvandenbossche
Copy link
Member

@mangecoeur Not yet the latest two I think: 'Added interval type' and 'Minor name change'.

@mangecoeur
Copy link
Contributor Author

@jorisvandenbossche good point, they are actually included in #6420 now. Tests for timeinterval needed. the name changed will probably get overridden by the API design later on anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants