Add SQL Support for ADBC Drivers #53869

WillAyd · 2023-06-26T20:27:24Z

No description provided.

pandas/io/sql.py

WillAyd · 2023-06-26T22:54:56Z

pandas/tests/io/test_sql.py

    with tm.assert_produces_warning(UserWarning, match="the 'timedelta'"):
        df.to_sql("test_arrow", conn, if_exists="replace", index=False)


 @pytest.mark.db
 @pytest.mark.parametrize("conn", all_connectable)
 def test_dataframe_to_sql_arrow_dtypes_missing(conn, request, nulls_fixture):
+    if conn == "postgresql_adbc_conn":
+        request.node.add_marker(
+            pytest.skip("int8/datetime not implemented yet in adbc driver")


There is no such thing as an 8-bit integer in postgres, so this test wouldn't round trip in its current form. I think this being an int8 is just a test detail and not anything we actually care for. Can probably bump to int16, though that separately begs the question of how we want to add/extend the ADBC types

Whats the canonical way to add ADBC types?

Are you referring to the mapping of an arrow type to a database-specific type?

That could in theory occur in the driver or could be handled by pandas itself. Since timestamp is a primitive in most databases I think that will be handled by the driver (plan on looking at this today/tomorrow myself for postgres)

Int8 is a little trickier. From the databases I've used I am not aware of 8 bit integers being all that common, so either the driver implements some compatability to another type (logically int16) or leaves it to the application to put its data into a supported type. My guess is the ADBC drivers may want to start explicitly and force end users to upcast to int16 if that's what they want, but @lidavidm knows best

If there's a strong convention (like timestamp strings in SQLite) I'd like to support it natively, otherwise I think it would be best for the client to be explicit about what it wants (and if there's something the driver can do to help, we can do it - e.g. I think it's reasonable to optionally ingest int8 as SMALLINT in the driver, so long as you aren't concerned about perfect roundtripping)

Well we kinda do this already for numpy types -> sqlalchemy types so not opposed to adding that in pandas. I was just curious what the API for defining these is like.

int8 support was added in apache/arrow-adbc#858 and working on timestamp in apache/arrow-adbc#861; guessing these can make it into the next ADBC release and we can just start support at that as a minimum version to keep things easy

mroeschke · 2023-06-27T17:42:31Z

pandas/io/sql.py

+            stmt = f"SELECT * FROM {table_name}"
+
+        with self.con.cursor() as cur:
+            return cur(stmt).fetch_arrow_table().to_pandas()


I would be nice to at minimum support dtype_backend and return arrow backed types

I think this should always just return arrow backed types. Related to the other conversation around kwargs I am unsure of the best way to handle this. If we raise for non-default arguments this wouldn't work; alternately we could except the dtype_backend argument from raising for non-default arguments but arguably is heavy handed to require end users to specify that when they are already using the ADBC driver

You'd have to add types_mapper=pd.ArrowDtype for this to work.

Not sure how I'd feel about arrow backed only, this makes sense but we went in a different way for other readers...

Ah didn't realize that. Thanks for the heads up

xref #51846 for long-term

mroeschke

Adding the new library in install.rst would be good too

WillAyd · 2023-06-27T17:52:27Z

AFACIT the pyarrow drivers required pyarrow>=8.0.0 since they use the RecordBatchReader object, but our min version is still pinned at 6.0.0. @phofl I know you've upgraded us in the past with pyarrow - is it too soon to make the jump to 8.0?

lidavidm · 2023-06-27T18:50:05Z

We could likely relax Python and PyArrow minimum versions further if needed

phofl · 2023-06-27T19:01:01Z

Technically we could upgrade for 2.1, but this depends on pdep 10 as well

WillAyd · 2023-06-27T19:52:34Z

What is the link between this and PDEP 10? The ADBC dbapi requires pyarrow to use but I don't think that impacts how we need to manage that on our end? Or am I overlooking something simple?

phofl · 2023-06-27T20:14:32Z

PDEP10 has some language around minimum supported Arrow versions

jorisvandenbossche · 2023-06-28T07:26:27Z

But note that the PDEP-10 text is even more conservative, so based on that we would not be bumping to pyarrow 8 as a minimum version any time soon (only next year).

AFACIT the pyarrow drivers required pyarrow>=8.0.0 since they use the RecordBatchReader object, but our min version is still pinned at 6.0.0.

IMO there is really no problem in requiring a more recent pyarrow version for certain new features, such as pyarrow 8 for using adbc in our SQL IO, while the general minimum required version is lower.

pandas/io/sql.py

WillAyd · 2023-06-28T22:12:56Z

IMO there is really no problem in requiring a more recent pyarrow version for certain new features, such as pyarrow 8 for using adbc in our SQL IO, while the general minimum required version is lower.

Yea I agree. That definitely adds complexity to test compat/xfailing, but I think can be reasonably implemented in another pass at this

WillAyd · 2023-11-13T17:23:19Z

doc/source/whatsnew/v2.2.0.rst

+   with pg_dbapi.connect(uri) as conn:
+       df2 = pd.read_sql("pandas_table", conn)
+
+The Arrow type system offers a wider array of types that can more closely match


I think this is important enough of a point to make in the changelog, but should also probably put in io.rst. Just wasn't sure how to best structure that yet

+1 to include in io.rst. I think it might be good to add a separate section in io.rst to talk about type mapping (if there isn't one already) and also include sqlalchemy type mapping

WillAyd · 2023-11-14T16:17:07Z

Alright all green and I think I've address comments. Let me know if anything is missing

pyproject.toml

doc/source/user_guide/io.rst

pandas/io/sql.py

pandas/tests/io/test_sql.py

mroeschke

A few comments otherwise looks good

WillAyd · 2023-11-17T20:04:40Z

OK all green and I think I addressed your feedback @mroeschke

mroeschke · 2023-11-18T01:45:38Z

doc/source/getting_started/install.rst

@@ -335,7 +335,8 @@ lxml                      4.9.2              xml             XML parser for read
 SQL databases
 ^^^^^^^^^^^^^

-Installable with ``pip install "pandas[postgresql, mysql, sql-other]"``.
+Traditional drivers are installable with ``pip install "pandas[postgresql, mysql, sql-other]"``.  ADBC drivers


I think this is no longer anymore since you added them to the postgresql and sql-other extra in the pyproject.toml

mroeschke · 2023-11-18T01:45:53Z

doc/source/getting_started/install.rst

@@ -345,6 +346,8 @@ SQLAlchemy                2.0.0              postgresql,     SQL support for dat
                                             sql-other
 psycopg2                  2.9.6              postgresql      PostgreSQL engine for sqlalchemy
 pymysql                   1.0.2              mysql           MySQL engine for sqlalchemy
+adbc-driver-postgresql    0.8.0                              ADBC Driver for PostgreSQL


this is in the postgresql extra now

mroeschke · 2023-11-18T01:46:01Z

doc/source/getting_started/install.rst

@@ -345,6 +346,8 @@ SQLAlchemy                2.0.0              postgresql,     SQL support for dat
                                             sql-other
 psycopg2                  2.9.6              postgresql      PostgreSQL engine for sqlalchemy
 pymysql                   1.0.2              mysql           MySQL engine for sqlalchemy
+adbc-driver-postgresql    0.8.0                              ADBC Driver for PostgreSQL
+adbc-driver-sqlite        0.8.0                              ADBC Driver for SQLite


this is in the sql-other extra now

mroeschke

Small comments otherwise LGTM. Any other thoughts @jorisvandenbossche

mroeschke · 2023-11-22T18:20:01Z

Nice! Thanks @WillAyd. Can have follow up PRs if needed

WillAyd added 5 commits June 26, 2023 12:07

close to complete implementation

4f2b760

working implementation for postgres

a4ebbb5

sqlite implementation

b2cd149

Added ADBC to CI

512bd00

Doc updates

f49115c

WillAyd requested a review from mroeschke as a code owner June 26, 2023 20:27

Whatsnew update

a8512b5

WillAyd commented Jun 26, 2023

View reviewed changes

pandas/io/sql.py Outdated Show resolved Hide resolved

WillAyd mentioned this pull request Jun 26, 2023

Document Comparison to pandas? apache/arrow-adbc#812

Closed

WillAyd added 2 commits June 26, 2023 15:23

Better optional dependency import

c1c68ef

min versions fix

3d7fb15

WillAyd commented Jun 26, 2023

View reviewed changes

WillAyd added 3 commits June 26, 2023 22:10

import updates

1093bc8

docstring fix

926e567

Merge remote-tracking branch 'upstream/main' into adbc-integration

093dd86

mroeschke reviewed Jun 27, 2023

View reviewed changes

doc fixup

fcc21a8

mroeschke added the IO SQL to_sql, read_sql, read_sql_query label Jun 27, 2023

jorisvandenbossche reviewed Jun 28, 2023

View reviewed changes

pandas/io/sql.py Outdated Show resolved Hide resolved

pandas/io/sql.py Outdated Show resolved Hide resolved

WillAyd added 3 commits July 14, 2023 12:22

Merge remote-tracking branch 'upstream/main' into adbc-integration

88642f7

Updates for 0.6.0

156096d

fix sqlite name escaping

dd26edb

WillAyd added 3 commits November 13, 2023 10:37

Merge branch 'main' into adbc-integration

150e267

fixes

1e77f2b

more documentation

97ed24f

WillAyd commented Nov 13, 2023

View reviewed changes

WillAyd added 3 commits November 13, 2023 13:20

doc spacing

7dc07da

doc target fix

52ee8d3

pyarrow warning compat

1de8488