-
Notifications
You must be signed in to change notification settings - Fork 285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Execution Timeout Expired Error (258, ReadSniSyncOverAsync) #647
Comments
Have you tried increasing the command timeout? |
Thanks, @ericstj, No, I didn't try increasing command timeout. Currently, the command timeout is 3s. If a query executes timeout, I got stack like below, seems it is different than the above stack, do you mean the above stack is also a SQL execution timeout? By the way, from New relic, the select query's duration is 83.4ms, I'm not sure if it is the execution time.
|
@frankyuan, thanks for bringing this up to our attention, can you please let me know if:
Thank you |
Sorry for reply delay, here is more detail information(by the way, not every cluster have above error, just in one cluster):
|
@frankyuan thanks for the response. Can you provide sample repro code? If sample repro is not possible (without sample app it is really difficult to understand what is happening), can you share the connection string properties please? |
SqlConnectionString like this: Data Source=xxx;Initial Catalog=xx;User ID=xx;Password=xx;Min Pool Size=5;Max Pool Size=100;Connect Timeout=8. CommandTimeout=3. As I can't reproduce in my local environment, I'm afraid can't provide sample repro for now. But I'm trying to reproduce, if find a method, will update here. |
Seems only the SQL mentioned in description will throw the exception. And sometimes there is another similar exception like below when executing the same SQL.
|
@frankyuan any update on the issue on your side? Were you able to repro the issue? If not, can you kindly share your code or a similar code with me? I need the query you run and setting you have on your machine please. On the other note, can you test it with increasing Connection and command time out as well? |
I'm experiencing a similar issue
.NET target: Core 3.1 |
We are also having similar issue like @Svisstack . Microsoft.Data.SqlClient 2.0.1, .NET Core 3.1, running in Windows container and connecting to SQL Azure Hyperscale DB. We have a multi-threaded service, lets say about 30 parallel threads that are writing data to the same database. There is a transaction per thread. The service is running fine for most of the time, but from time to time, some threads start to get the exceptions below and they are stuck on it. The command that is failing on this timeout is the same for all threads, but in each occurrence it's usually a different command. Quite often it's a command that is sending some table valued parameters, but we saw it to fail on a very basic SELECT with a single parameter as well. There is no activity on the DB side, connections are in awaiting command state. From our observations it seems that when this exception is thrown, all threads are blocked on it. They never recovered from it.
When we noticed this exception, it seems that only some threads were affected and after approximately 30 minutes the issue disappeared.
I would like to provide a repro, but it's quite hard to reproduce that nondeterministic behavior. Interesting is that we have 6 instances of this service, where each instance is connecting to different DB and has different number of threads, but we noticed the issue only on 2 instances. One of them is the above mentioned one with 30-35 threads, another one is way smaller, only 3-4 threads. |
Sorry for getting back to you late, I couldn't reproduce this issue currently, and the only clue is that there are more than 3000 records in the result, after I changed the logic to get fewer records, the exception is gone. |
@frankyuan thanks for the update. @Svisstack, @lukas-navratil can any of you guys provide a sample repro application please? |
I dont have this possibility unfortunately.
…Sent from my iPhone
On 25 Sep 2020, at 19:27, Javad ***@***.***> wrote:
@frankyuan thanks for the update.
@Svisstack, @lukas-navratil can any of you guys provide a sample repro application please?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
@JRahnama we are seeing the same issues, we will keep an eye on it an see if we can replicate it at will. At the momement is very random:
|
@twurm can you give us a repo application? |
@JRahnama we are working on a repo app. The event doesn't happen all the time and happens in different areas of the application so we are trying to figure that out. |
same experiences
…On Mon, Sep 28, 2020 at 11:14 PM Tom Wurm ***@***.***> wrote:
@JRahnama <https://github.com/JRahnama> we are working on a repo app. The
event doesn't happen all the time and happens in different areas of the
application so we are trying to figure that out.
|
@frankyuan, @twurm, @mischaherbrand, @Svisstack, @lukas-navratil while I was trying to make a scenario to reproduce the issue Thanks everybody |
Cant confirm that currently, too much code on my side, but will have this in mind
…Sent from my iPhone
On 16 Oct 2020, at 23:56, Javad ***@***.***> wrote:
@frankyuan, @twurm, @mischaherbrand, @Svisstack, @lukas-navratil while I was trying to make a scenario to reproduce the issue
, I came up with a sample code the each first try to read from the pool was throwing the very same error message while the SqlCommand was not wrapped in a using block, but the rest of the recycled connection worked fine until the pool was renewed by reaching its max poolsize or a fatal error happened and pool got cleared and then the first connection out of the pool had the very same issue again. As a workaround can you make sure all your SqlConnection, SqlCommands and SqlDataReaders are wrapped in a using block properly and see if the issue comes up? That will also help me to understand if I am on the right track with the repro.
Thanks everybody
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
@JRahnama , If I didn't miss something, all |
@JRahnama , I found one command that wasn't wrapped in using block. I will deploy the fix and try to monitor if it helped or not. |
@JRahnama , so it seems that since we deployed 8 days ago the version with added using around SqlCommand, the Win32Exception 258 hasn't been thrown. However exception 121 was thrown several times. I can't tell whether it would recover from that as we implemented automatic restart of service when it's stuck on the same SqlException for too long. I'm not sure either that it's related to this issue, but it's just a heads up that this error still happens in our environment as I reported it above.
|
@JRahnama, unfortunately, our app got stuck today on this issue again:
|
I am getting this as well. In my case its from within EFC DbContext. ASP.NET Core 3.1 (and its bundled EFC and SqlClient). I also use the EFC's DB Context Pool: One of my colleagues insisted this is comming from SQL login timeouts (you can use that as a hint but I would not rely on that) My connection string:
|
I would like to share my experience on this. I used to get exactly the same exception/stack trace when running a query. It was odd because running the execution plan it always showed super optimized, it seemed not a performance problem, but often the exception occurred. We had no idea what to do. But since a time ago, we are rebuilding indexes every week, and exception was not raised anymore. My conclusion is, sometimes SQL server was doing something, or some overload in the SQL server but for any reason refreshing indexes reduces these overloads. |
For us, increasing the threadpool threads eliminated this issue. We're still on linux and are not seeing this (anymore) |
We also started seeing unknown error 258 after migrating from Windows to Linux App Services, but strangely only on one of our environments.
@MichelZ Out of curiosity, what did you increase the minimum threads to? |
We went aggressive/overkill, but we haven't seen negative side effects currently.
|
Thought I'd add that when we investigated this a year ago, we felt that analysis from @bennil, above, married up with what we were seeing. Not sure I'm comfortable with your solution @MichelZ even though you haven't suffered any negative consequences. Great it is working for you though. |
Thanks @co7e for bringing this matter up again. My insight after watching/analyzing this for years are the following: SqlClient is not thread safe on Linux (saw it on Intel and AMD). Reducing parallel work or adjust thread execution might help but not solve the problem. Sooner or later the exception will hit you. Surprisingly Microsoft doesn't care for years even though a huge number of enterprise software is running on this db access lottery. The bug is hard to reproduce as all threading issues are. I have some ideas how to construct testcode for further investigations, but this is not a charity project... |
Your stack trace doesn't contain |
In our case the problem has now as mysteriously disappeared as it has mysteriously appeared. The only change we made was migrate to a different virtual machine than the previous SQL Server was running on. By taking some packet captures by logging in with ssh on the Linux Web App, and typing A packet containing a SQL query that in the tcpdump was registered with size 2130 bytes and data length 2064 was being "partially acked". After sending the large packet out, the ACK came back only for the first 1398 bytes, causing a retransmission of the last 666 bytes. However, in response to this retransmission only came another ACK for only the first 1398 bytes, causing another retransmission of the last 666 bytes, and so on and so forth, until eventually the connection timed out. Rather than a threading issue this potentially seems like some kind of network layer problem. However, we can no longer reproduce it, so it's difficult to investigate more deeply. As @dazinator already suggested our particular problem was probably something entirely different than what this ticket is about. The 258 timeout error is fairly general - it just means that some kind of wait operation timed out and it can have multiple different underlying causes. Sorry for the noise. |
We have also experienced the same issue. The solution, for us, was to limit the packet size in the Connection string by appending See https://learn.microsoft.com/en-us/dotnet/api/system.data.sqlclient.sqlconnection.packetsize?view=dotnet-plat-ext-8.0 for more info on SqlConnection.PacketSize property. |
https://www.codeproject.com/Tips/5378079/Challenges-in-Migrating-ASP-NET-Apps-to-Containers Maybe after all it relates to MARS? (DB connection string: MultipleActiveResultSets=True) |
Not in mine or many other cases. The MARS issue was seperate as far as I know. I use Note: After moving the work to a seperate dedicated process (and therefore removing contention of the application thread pool in the process) I no longer see this issue. The new process starts - on the same machine, runs the same queries, and terminates as expected. Also, not only are we running this logic in a dedicated process, we have 10 jobs to run, rather than them all being run in application at same time as before, they are now 10 seperate dedicated processes that start then terminate. So this has resulted in 11 seperate processes, each with own thread pool and using own connection. So it seems to be to do with:-
|
I also wondered if Azure SQL has some sort of connection "rejection" mechanism, such that when it scales up or CPU usage is high, it can reject queries with a 258 "immediately" and that perhaps when a connection is in this state, it doesn't clear. A bit like a rate limiting feature. However this is pure speculation, I don't have the networking skills to trace what packets is being sent when this occurs, as our apps run in docker, on linux azure VM's and things get pretty complicated pretty fast. |
We are closing this issue as the error is very generic and applies to several valid failure scenarios that might have different underlying causes including network issues and thread starvation. There is no real fix for this. |
@jaq316 Have you seen anymore instances of this issue after applying this change? We see it randomly, but it causes a lot of issues and latency when it does happen. We've tried everything but nothing seems to be working. I see queries completing in the DB because we run some stored procs that have logging and log the steps in certain procedures but the result never seems to return back to the app and the timeout occurs. |
Hoping this can help someone. I haven't been able to fully diagnose the root cause of my timeouts but here is what I observed. We received this timeout error in app insights along with some other app log information. I was able to correlate DB logs from the stored proc that was executed to match the timeout that was reported in app insights. The timestamp on the logs was ~15 seconds prior to the timeout which lines up because our timeout is set to 15 seconds, so the stored proc executed successfully but the result never returned to the app triggering the timeout. This kind of points to what @jaq316 reported, possibly getting stuck in an ACK loop and eventually timing out but I haven't been able to prove it out because it's only occurring in our prod environment and we haven't been able to fully probe and run a tcpdump to capture the traffic. |
@mkieloch-352 ... We have not experienced this since applying this change. |
Call `Dispose()` method on SQL command to avoid resouce leaks. Try approach from the following comment: dotnet/SqlClient#647 (comment) Part of #117
Call `Dispose()` method on SQL command to avoid resouce leaks. Try approach from the following comment: dotnet/SqlClient#647 (comment) Part of #117
I'm also having this same problem in my WEB API. But my problem is originating from docker, I can guarantee that. Look this stack-trace. Execution Timeout Expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
## Unknown error 258
## at System.Threading.Tasks.ContinuationResultTaskFromResultTask`2.InnerInvoke()
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state) --- End of stack trace from previous location ---
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread) --- End of stack trace from previous location ---
at Microsoft.EntityFrameworkCore.Storage.RelationalCommand.ExecuteReaderAsync(RelationalCommandParameterObject parameterObject, CancellationToken cancellationToken)
at Microsoft.EntityFrameworkCore.Storage.RelationalCommand.ExecuteReaderAsync(RelationalCommandParameterObject parameterObject, CancellationToken cancellationToken)
at Microsoft.EntityFrameworkCore.Query.Internal.SingleQueryingEnumerable`1.AsyncEnumerator.InitializeReaderAsync(AsyncEnumerator enumerator, CancellationToken cancellationToken)
at Microsoft.EntityFrameworkCore.SqlServer.Storage.Internal.SqlServerExecutionStrategy.ExecuteAsync[TState,TResult](TState state, Func`4 operation, Func`4 verifySucceeded, CancellationToken cancellationToken)
at Microsoft.EntityFrameworkCore.Query.Internal.SingleQueryingEnumerable`1.AsyncEnumerator.MoveNextAsync()
at Microsoft.EntityFrameworkCore.EntityFrameworkQueryableExtensions.ToListAsync[TSource](IQueryable`1 source, CancellationToken cancellationToken)
at Microsoft.EntityFrameworkCore.EntityFrameworkQueryableExtensions.ToListAsync[TSource](IQueryable`1 source, CancellationToken cancellationToken)
at App.Core.Services.Process.SearchFor(Query query, InfoSession session) My Web API runs on dotnet 8.x with the latest version of entityframework. In my searches, both those who use DAPPER and those who use entityframework are having this problem, so it is something more internal. Running a local test pointing to the production database without using docker, everything works fine. I ran the same query generated by entityframework directly in the database and the query was done in less than 1 second as expected. Later I ran the API through Docker and did the same thing again, I received the timeout error "Unknown error 258". |
This issue should have its own bug bounty program |
Have you tried increasing the MinThreads as mentioned multiple times in this thread? ( |
Yes, it was one of the first things I tried. It reduced the number of times this occurred, but did not solve the problem.
Yes, it was one of the first things I tried. It reduced the number of times this occurred, but did not solve the problem. |
Describe the bug
When executing SQL such as
SELECT FieldA, FieldB FROM A INNER JOIN C ON A.FieldId = C.FieldId UNION SELECT FieldA, FieldD FROM A INNER JOIN D ON A.FieldId = D.FieldId
, throw the error like below, not every time, just a little part of queries have this issue.To reproduce
Sorry, currently can't reproduce in the local environment, so can't provide more detail to reproduce.
Expected behavior
SQL should execute successfully every time.
Further technical details
Microsoft.Data.SqlClient version: 1.1.3
.NET target: Core 3.1
Operating system: Docker container
What I found/tried
https://stackoverflow.com/questions/57270245/sql-server-dbcommand-timeout-with-net-core-container-under-load
DapperLib/Dapper#1435
The text was updated successfully, but these errors were encountered: