Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for cancel query by adding synchronous readers. #1246

Merged
merged 5 commits into from
Sep 20, 2021

Conversation

smartguest
Copy link
Contributor

@smartguest smartguest commented Sep 17, 2021

Due to an ongoing known issue that affects the asynchronous reader in SqlParser, it cannot cancel any long running query that isn't expected to end shortly or is an infinite loop scenario.

Since this is a common issue among many users and does not reproduce on SQL Server Management Studio due to it not using the asynchronous reader. I have replaced the problematic async reader with the synchronous version.

Queries still execute the same so far without any major changes, but infinite running queries are now cancelled properly when run from the ADS side. I have tested this with a select top query and an infinite loop query scenario so far.

This addresses: microsoft/azuredatastudio#3231 and #1244

@alanrenmsft
Copy link
Contributor

I am not sure whether this is the correct fix. async/cancellation token is the right approach. instead of changing to sync data reader, have you tested why is the cancellation not working previously?

@smartguest
Copy link
Contributor Author

smartguest commented Sep 18, 2021

@alanrenmsft @Charles-Gagnon Yes, I have tested the issue extensively, debugging the code, the execution always stops at that line to call cancel on the dbcommand or token (but its likely cancel is being stuck on dbcommand) without any warning or error whatsoever, and there is no error returned, it seems the query on the server, continues to execute while the call to cancel hangs forever, no lines after it execute.

According to the reports on this bug in SqlClient:

dotnet/SqlClient#44 (comment)
dotnet/SqlClient#44 (comment)

SqlClient async reader is currently unable to handle any infinite loop scenarios due to a race condition occurring between the cancellation and closing in SqlCommand.

as @saurabh500 says:

"In this case the query is executed on the server, but it will never complete as it is an infinite loop.
In the client, after sending a query for execution, the client waits for an acknowledgement from the server by locking the TdsParserStateObject and then waiting for a network packet.
How cancellation works is, the client is responsible for making sure that it completes the existing requests and then sends a cancellation request to the server. Since the existing request is never going to complete, the cancellation will not go through either.
The hang happens in case of queries like above which are never expected to complete, or if you try to cancel an operations while streaming endless data from SqlServer (like XEvents) and try to close the reader before cancelling.

This is essentially a race condition between Reader.Close and Reader.Cancel"

@Wraith2 says that:

At the moment a lock is taken in EndExecuteNonQuery and then in TdsParserStateObject.Cancel the lock is attempted and cancellation isn't allowed if it can't be acquired.

The only workaround so far as suggested by @cheenamalhotra
dotnet/SqlClient#44 (comment)

Is to either use Timeout or use Sync APIs to cancel.

From what I have read in the bug, SSMS also uses sync readers to do its operations apparently.

dotnet/SqlClient#1065 This is the current bug page on SqlClient, as of the moment its still not fully resolved, and there's no option to reliably cancel a connection other than closing the connection which we want to avoid.

Copy link
Contributor

@Charles-Gagnon Charles-Gagnon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like a fix in sqlclient is merged in now (dotnet/SqlClient#956) but given our experiences with taking brand new sqlclient releases doing this fix for now seems reasonable.

This shouldn't impact performance as the queries are spun on their own thread anyways and aren't awaited from what I can see.

@smartguest
Copy link
Contributor Author

smartguest commented Sep 20, 2021

@Charles-Gagnon I'm happy to hear that SqlClient has fixed this issue. Once we put the latest version into our code, I will revert this PR.

@alanrenmsft
Copy link
Contributor

how do we cancel the query execution now? just ignore it? does it mean we have a thread that never stops execution?(or the execution timeout will be used in the case?)

also, we have to make sure when a query is canceled it won't send anything back to ADS.

@Wraith2
Copy link

Wraith2 commented Sep 20, 2021

given our experiences with taking brand new sqlclient releases

That sounds like bad experiences, what happened and was it preventable from the SqlClient side?

@smartguest
Copy link
Contributor Author

smartguest commented Sep 20, 2021

We can still call cancel on the sync reader @alanrenmsft without any issue when the cancellationtoken is cancelled as there is a small callback in the function that calls cancel on the DbCommand, and it seems to work as expected with the query cancelling. It's just that the async reader does not respond at all to this command to cancel. This just makes the code work given the current design of how cancellation is done in ADS and STS. The behavior so far when I tested things out appears to be similar to that of SSMS.

Specifically this part doesn't work with sync reader:

cancellationToken.Register(() => dbCommand?.Cancel()); on Line 398

@Charles-Gagnon
Copy link
Contributor

@alanrenmsft What do you mean? The cancellation token will work as expected now since we're using the sync functions - that's the fix here (the cancellation was being ignored previously as the bugs that Alex linked to said)

If a cancel isn't requested then yes, the query can execute infinitely. It should generally be up to the caller to handle that and add an appropriate timeout if needed though I would say.

@Wraith2 - There was a fix earlier (dotnet/SqlClient#796) that we updated sqlclient for. But the initial release with this fix ended up having another bug in it (dotnet/efcore#19842 I believe). This issue was fixed quickly but we just happened to pick it up before that issue was noticed/fixed and so ended up releasing with that version and immediately had a bunch of customers hit the issue. So something like that is hard to make completely preventable - you can always add more tests (and ones were) but given that it's hard to test what you don't know is broken I think this was just a case of bad timing and the perfect storm of events.

@smartguest smartguest merged commit 41d44d8 into main Sep 20, 2021
@smartguest smartguest deleted the alex/fixforcancelquery branch September 20, 2021 17:26
nofield pushed a commit that referenced this pull request Jul 19, 2022
* added WIP sync execute method

* added syncexecuteonce

* replaced executeonce with synchronous reading

* removed batch.cs change

* restore space
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants