-
Notifications
You must be signed in to change notification settings - Fork 14.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix common sql DbApiHook fetch_all_handler #25430
Conversation
if cursor.returns_rows: | ||
if cursor.description is not None: | ||
return cursor.fetchall() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn’t make sense. description
only returns some information of the cursor and has nothing to do to whether the cursor returns data or not.
According to PEP 249, whether a cursor returns information can be checked by
if cursor.rowcount is not None and cursor.rowcount >= 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@uranusjr This doesn't look true to me, I am using the following as reference:
- https://peps.python.org/pep-0249/#description
- https://docs.sqlalchemy.org/en/14/core/connections.html?highlight=returns_#sqlalchemy.engine.CursorResult.returns_rows
Also:
>>> import pymssql
>>> c = pymssql.connect(host, login, password)
>>> cur = c.cursor()
>>> cur.execute("SELECT SUSER_SNAME();")
>>> cur.rowcount
-1
>>> cur.description
(('', 1, None, None, None, None, None),)
>>> cur.execute("PRINT('1');")
>>> cur.rowcount
-1
>>> cur.description
Edit: I have the same behaviour with sqlite3
and jaydebeapi
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure how pymssql does things, but according to PEP 249, description
does not offer the same functionality as SQLAlchemy’s return_rows
. If rowcount
does not either, you need to find another way that actually has a backing standard. Since DbApiHook should work for all standard-compliant databases, we can’t rely on individual database behaviours, but must refer to the standard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@uranusjr Totally agree on the all standard-compliant part. However then this would mean the sqlalchemy's documentation is wrong since for returns_rows
it only mentions description
. Do you have an example where description is not None and no rows where returned ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@uranusjr I find sqlalchemy notes on rowcount: https://docs.sqlalchemy.org/en/14/core/connections.html?highlight=returns_#sqlalchemy.engine.CursorResult.rowcount very interesting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quoting sqlalchemy:
- about
CursorResult.return_rows
"Overall, the value of CursorResult.returns_rows should always be synonymous with whether or not the DBAPI cursor had a .description attribute, indicating the presence of result columns, noting that a cursor that returns zero rows still has a .description if a row-returning statement was emitted." - about
row_count
(aforementioned link) "Statements that use RETURNING may not return a correct rowcount." - about
row_count
(aforementioned link) "Contrary to what the Python DBAPI says, it does not return the number of rows available from the results of a SELECT statement as DBAPIs cannot support this functionality when rows are unbuffered."
Quoting PEP-249:
- about
cursor.rowcount
https://peps.python.org/pep-0249/#id48 "The term number of affected rows generally refers to the number of rows deleted, updated or inserted by the last statement run on the database cursor." - about
cursor.description
https://peps.python.org/pep-0249/#description "This attribute will be None for operations that do not return rows or if the cursor has not had an operation invoked via the .execute*() method yet."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
about cursor.description https://peps.python.org/pep-0249/#description
"This attribute will be None for operations that do not return rows
or if the cursor has not had an operation
invoked via the [.execute*()](https://peps.python.org/pep-0249/#id14) method yet."
This is indeed part of the standardm, so I do no see why we should not base the decision on that @uranusjr ? It's quite explicitly stated in the PEP that description is only present when there are some rows potentially to be returned (and it can be 0 rows as well).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I particularly do not like about rowcount is this noe about -1
:
The attribute is -1 in case no [.execute*()](https://peps.python.org/pep-0249/#id14)
has been performed on the cursor or the rowcount of the last operation
is cannot be determined by the interface. [[7]](https://peps.python.org/pep-0249/#id46)
Note
Future versions of the DB API specification could redefine the latter case to have the object return None instead of -1.
I think just having the note indicate that we should avoid it, and there is absolutely no more guarantees rowcount gives us than description:
This attribute will be None for operations that do not return rows
or if the cursor has not had an operation
invoked via the [.execute*()](https://peps.python.org/pep-0249/#id14) method yet."
This is the same, only less ambiguous IMHO.
d4ff834
to
1ad0e8b
Compare
First of all, I made one more error here. So if I correctly understand we have to determine how to figure out if
Here's the implementation for Postgres and seems we could use description. And here are some experiments:
|
I think the problem with |
Pep and sqlalchemy states Regarding I don't think that the fact that we are using the same cursor in the run for loop causes an issue. Sqlalchemy's implementation of Sqlalchemy's underlying implementation compliancy with dbapi2 should be the one to follow. @kazanzhy thanks for the experiments, as I am saying I think that if you use the same cursor like we do in the run loop you'll get correct description. If you could check with pgsql, I have already seen this for other drivers in the past as well. |
@FanatoniQ I don't get it. You're saying that
But in this PR you're changing |
The sqlalchemy's implementation of The fact is that in If you look at my duplicated issue, it explains why the tests passed when they should have failed: only sqlalchemy has the I hope this is clear. I don't see why we would go another route to be dbapi2 compliant than to follow sqlalchemy: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given what we have, this change is good. I kind of wonder perhaps the more fundamental problem here is how fetch_all_handler
is designed to be used in the first place, but it’s likely difficult to rewrite things.
1ad0e8b
to
fb1c513
Compare
@potiuk @uranusjr @kazanzhy I force pushed so that the cursor values in the tests are not misleading: We should be good now 😉 |
Yep. I definitely want to merge that one before the next provider's wave :) |
🎉 |
This PR fixes
fetch_all_handler
mentioned in issues #25388 and possibly linked to #25412Ref:
Detailed explanation on the issue: #25429
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rst
or{issue_number}.significant.rst
, in newsfragments.