-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Implementation of client side statements that return #1046
Conversation
🤖 I detect that the PR title and the commit message differ and there's only one commit. To use the PR title for the commit history, you can use Github's automerge feature with squashing, or use -- conventional-commit-lint bot |
parsed_statement.client_side_statement_type | ||
== ClientSideStatementType.SHOW_COMMIT_TIMESTAMP | ||
): | ||
if connection.is_closed: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe move this to the start of this function. I think that we want to do this for any type of statement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -23,6 +23,12 @@ | |||
RE_BEGIN = re.compile(r"^\s*(BEGIN|START)(TRANSACTION)?", re.IGNORECASE) | |||
RE_COMMIT = re.compile(r"^\s*(COMMIT)(TRANSACTION)?", re.IGNORECASE) | |||
RE_ROLLBACK = re.compile(r"^\s*(ROLLBACK)(TRANSACTION)?", re.IGNORECASE) | |||
RE_SHOW_COMMIT_TIMESTAMP = re.compile( | |||
r"^\s*(SHOW VARIABLE COMMIT_TIMESTAMP)", re.IGNORECASE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this regex accept show variable commit timestamp
? And show\nvariable\ncommit\ntimestamp
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hopefully not because commit_timestamp
is one token :-) But as to spacing elsewhere, good point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed regex for spacing
bool: True if Spanner transaction started, False otherwise. | ||
""" | ||
def inside_transaction(self): | ||
"""Deprecated property which won't be supported in future versions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
"""Deprecated property which won't be supported in future versions. | |
"""Deprecated: This property may be removed in a future release. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is deprecated, should it be tagged @deprecated
?
(Python does support stacking decorators. Order matters; they are applied in the reverse of the order in which they are listed. @deprecated
deprecates a function and @property
turns a function into a property, so probably the deprecation should be listed last so that it's evaluated first, on the function rather than its derived property.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, Added @deprecated
@@ -405,13 +397,15 @@ def begin(self): | |||
Marks the transaction as started. | |||
|
|||
:raises: :class:`InterfaceError`: if this connection is closed. | |||
:raises: :class:`OperationalError`: if there is an existing transaction that has begin or is running | |||
:raises: :class:`OperationalError`: if there is an existing transaction | |||
that has begin or is running |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I know that it's not a change in this PR, but I just noticed it now):
that has begin or is running | |
that has has been started |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
tests/system/test_dbapi.py
Outdated
@@ -152,6 +153,87 @@ def test_begin_client_side(self, shared_instance, dbapi_database): | |||
conn3.close() | |||
assert got_rows == [updated_row] | |||
|
|||
def test_commit_timestamp_client_side(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
def test_commit_timestamp_client_side(self): | |
def test_commit_timestamp_client_side_transaction(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
tests/system/test_dbapi.py
Outdated
self._cursor.execute("SELECT * FROM contacts") | ||
self._cursor.execute("SELECT * FROM contacts") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this test should run two selects, but the results of both should also be consumed and verified that they contain what we expect.
Also, it would be good to verify that we can see the read timestamp already after executing the first query, and that it stays the same after the second query and after the commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
connection.rollback() | ||
return None | ||
if ( | ||
parsed_statement.client_side_statement_type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there are going to be lots of these if
s and they'll all be long due to the long variable name, you might consider doing statement_type = parsed_statement.client_side_statement_type
at the top of this block of if
s. Then you can use the shorter name throughout here.
You could also do something like Type = ClientSideStatementType
. Aliasing a class is a little less common, though. And, I think, probably not necessary to get the line length consistently under the length limit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related, very minor thing:
In Python, everything is interpreted and there's no JIT compiler doing type inference, so every time you do a.b
, the Python interpreter has to actually interpret that logic at runtime. Specifically -- go look up the type of a
, look up and run code associated with any overrides on a
's .
operator (for example properties are implemented under the hood as an overload to the dot operator that checks whether the name of the field that you're looking up is the same as the name of a property on the class and, if so, replaces that lookup with a call to the underlying getter(/setter(/deleter)) function), then if still applicable go look up b
in a
's member dictionary.
As a result:
a.b
a.b
a.b
a.b
(... lots of times ...)
is actually slightly-but-measurably slower than
c = a.b
c
c
c
c
(... lots of times ...)
In most other languages, unless a pointer dereference is required (which is still cheap), nested member-variable access is zero cost at runtime; all of this resolution is sorted out at compile time or by a JIT compiler.
The performance difference is not usually enough (and Python code is usually not performance-sensitive enough) that it matters. But it gently nudges Python to tend to use simple variables rather than complex nested structures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Defined a variable statement_type
. Thanks for the explanation, was not aware of it. Will try to remember and take care of this in future code
@@ -23,6 +23,12 @@ | |||
RE_BEGIN = re.compile(r"^\s*(BEGIN|START)(TRANSACTION)?", re.IGNORECASE) | |||
RE_COMMIT = re.compile(r"^\s*(COMMIT)(TRANSACTION)?", re.IGNORECASE) | |||
RE_ROLLBACK = re.compile(r"^\s*(ROLLBACK)(TRANSACTION)?", re.IGNORECASE) | |||
RE_SHOW_COMMIT_TIMESTAMP = re.compile( | |||
r"^\s*(SHOW VARIABLE COMMIT_TIMESTAMP)", re.IGNORECASE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hopefully not because commit_timestamp
is one token :-) But as to spacing elsewhere, good point.
bool: True if Spanner transaction started, False otherwise. | ||
""" | ||
def inside_transaction(self): | ||
"""Deprecated property which won't be supported in future versions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is deprecated, should it be tagged @deprecated
?
(Python does support stacking decorators. Order matters; they are applied in the reverse of the order in which they are listed. @deprecated
deprecates a function and @property
turns a function into a property, so probably the deprecation should be listed last so that it's evaluated first, on the function rather than its derived property.)
google/cloud/spanner_v1/snapshot.py
Outdated
self._read_request_count += 1 | ||
self._execute_sql_count += 1 | ||
|
||
if self._read_only and not transaction_id_set: | ||
peek = next(iterator) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this advancing the iterator by one?
It looks to me like line 485 is effectively rewinding the iterator? So this does implement the peek
idiom (assuming no exception is thrown between here and there), though I'd agree that it's confusing.
Reading the implementation of _restart_on_unavailable
, though: Do I read correctly that it's implemented by buffering the full resultset in memory? I'm curious if you know why we do that rather than yield
ing results as they arrive and, in case of ServiceUnavailable
, just replay to the point where we left off and keep streaming? The semantics would be slightly different, so there might be a good reason for this choice.
(I started reading that function because I was wondering if this was a custom iterator and if we could add a peek()
method to it. It looks to me like the current algorithm would in fact support peek()
just fine (because it buffers all results in memory anyway), but it's not a custom iterator so it wouldn't be trivial to add such a method. The more-itertools library contains a peekable
wrapper that would do this for us, though that would add a library dependency.)
google/cloud/spanner_v1/snapshot.py
Outdated
self._read_request_count += 1 | ||
self._execute_sql_count += 1 | ||
|
||
if self._read_only and not transaction_id_set: | ||
peek = next(iterator) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this fail if the query returns no results? (IIRC next()
with no second/default argument throws a StopIteration
exception or something if there's nothing left to return?) Should it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There would be one result (PartialResultSet object) where metadata
field would be present and values
would be an empty list
return _get_streamed_result_set( | ||
ClientSideStatementType.SHOW_COMMIT_TIMESTAMP.name, | ||
TypeCode.TIMESTAMP, | ||
connection._transaction.committed, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if _transaction
is None
? Or if the transaction has not yet committed? Will this then return a result set containing a single row/column containing a NULL
? Or something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
made the change to return an empty result set for both cases
return _get_streamed_result_set( | ||
ClientSideStatementType.SHOW_READ_TIMESTAMP.name, | ||
TypeCode.TIMESTAMP, | ||
connection._snapshot._transaction_read_timestamp, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here as above? What happens is _snapshot
is None
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same response as above
"This method is non-operational as transaction has not been started at " "client." | ||
) | ||
SPANNER_TRANSACTION_NOT_STARTED_WARNING = ( | ||
"This method is non-operational as transaction has not been started at " "spanner." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should make a difference between these two in any warnings that we return to the user. Whether a transaction has been started on Spanner or not, is essentially an implementation detail.
(Note: Internally, we can make this difference. But I don't think we should communicate it like that to a user.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed this warning as not needed now
made atleast one call to Spanner. Property client_transaction_started | ||
would always be true if this is true as transaction has to start first | ||
at clientside than at Spanner | ||
def ddl_statements(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a new property? If so, any specific reason that we are adding this in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intellij was giving a suggestion so created it. Let me know if I should revert it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would in that case at least make it internal (so let it start with an underscore). We should be as conservative as possible when it comes to adding public methods and properties to the interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the property as there is no different in accessing a property or a field in python when they both start with underscore and Intellij will give the same warning
if not self._client_transaction_started: | ||
warnings.warn( | ||
CLIENT_TRANSACTION_NOT_STARTED_WARNING, UserWarning, stacklevel=2 | ||
) | ||
return | ||
if not self._spanner_transaction_started: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above: I don't think we want to make this difference, but also this should be supported:
begin;
rollback;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the warning and added the system test test_begin_and_rollback
to test the mentioned use case
tests/system/test_dbapi.py
Outdated
assert len(got_rows[0]) == 1 | ||
assert len(self._cursor.description) == 1 | ||
assert self._cursor.description[0].name == "SHOW_READ_TIMESTAMP" | ||
assert isinstance(got_rows[0][0], DatetimeWithNanoseconds) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add another query to this test, and then verify that the next query gives us a new read timestamp?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
CONNECTION_CLOSED_ERROR = "This connection is closed" | ||
TRANSACTION_NOT_STARTED_WARNING = ( | ||
"This method is non-operational as transaction has not been started." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
"This method is non-operational as transaction has not been started." | |
"This method is non-operational as a transaction has not been started." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -35,7 +36,7 @@ | |||
|
|||
|
|||
CLIENT_TRANSACTION_NOT_STARTED_WARNING = ( | |||
"This method is non-operational as transaction has not started" | |||
"This method is non-operational as transaction has not been started." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
"This method is non-operational as transaction has not been started." | |
"This method is non-operational as a transaction has not been started." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Implementation of SHOW_COMMIT_TIMESTAMP and SHOW_READ_TIMESTAMP client statements