Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poor INSERT performance with ExecuteNonQuery() and many parameters #974

Closed
KristianWedberg opened this issue Mar 9, 2021 · 86 comments · Fixed by #1041
Closed

Poor INSERT performance with ExecuteNonQuery() and many parameters #974

KristianWedberg opened this issue Mar 9, 2021 · 86 comments · Fixed by #1041
Labels
📈 Performance Issues that are targeted to performance improvements.

Comments

@KristianWedberg
Copy link

Description

When inserting rows with ExecuteNonQuery() and placeholder parameters, performance
(measured as inserted parameters per second) with the Microsoft.Data.SqlClient v2.1.1
and System.Data.SqlClient providers is progressively impacted from about 50+ parameters
per statement, both for single rows and for multirow batches.

Inserting with 2048 parameters in the statement is up to 7 times slower than with
128 parameters.

The issue is not present when using the System.Data.Odbc provider (or non-SQL Server
providers & databases), and performance is up to 25 times slower (when using 2048 parameters
per statement) than with the ODBC provider.

I've tested with Windows 10, .NET5.0 and a local SQL Server 2019 database, varying the
number of columns from 1 to 1024, and number of rows in each batch from 1 to 1000.

Note: The 'waviness' at 1000 and 2000 parameters is just an effect of the 1000/1024 and 2000/2048
numbers being separated in the graph - see the write-up for details:

Average Throughput

The target table is un-indexed and the inserts uses SQL-92 syntax:

INSERT INTO tablename (column-a, [column-b, ...])
VALUES (@r0_c0, [@r0_c1, ...]),
       (@r1_c0, [@r1_c1, ...]),
       ...

While table valued parameters and bulk inserts don't have this issue, they won't
help when inserting a single or just a few very wide rows at a time, so it would be
really useful to have this addressed (or identify any gremlins on my part!)

Please see the full write-up with charts, tables, BenchmarkDotNet info etc. at:

https://github.com/KristianWedberg/sql-batch-insert-performance

Reproduce

Fully runnable source (using BenchmarkDotNet) and instructions at:

https://github.com/KristianWedberg/sql-batch-insert-performance

Expected behavior

Using more parameters (beyond 128) in an insert row or batch should increase throughput,
measured as the number of parameters inserted per second, just like it does with
System.Data.Odbc and other non-SqlClient providers.

Technical details

  • Microsoft.Data.SqlClient Version="2.1.1"
  • System.Data.Odbc Version="5.0.0"
  • System.Data.SqlClient Version="4.8.2"
  • OS=Windows 10.0.19042
  • .NET Core SDK=5.0.103
@KristianWedberg KristianWedberg changed the title Poor INSERT performance with ExecuteNonQuery() and many parameters Poor INSERT performance with ExecuteNonQuery() and many parameters Mar 9, 2021
@Wraith2
Copy link
Contributor

Wraith2 commented Mar 9, 2021

What is the real world use case for using so many parameters?

@KristianWedberg
Copy link
Author

KristianWedberg commented Mar 9, 2021

I've personally come across this with:

  • Wide rows to capture different stages of a process (think lots of dates and associated data). One real estate company used 450 columns for this.
    • 'Type 3 slowly changing dimensions' is a similar use case
  • Wide rows for denormalized data marts
  • Multi-row batches to improve performance
    • While table valued parameters and bulk inserts can handle this use case for SQL Server, those techniques are not portable, which is a big draw-back

@Wraith2
Copy link
Contributor

Wraith2 commented Mar 9, 2021

Multi-row batches to improve performance
While table valued parameters and bulk inserts can handle this use case for SQL Server, those techniques are not portable, which is a big draw-back

This is likely to be addressed in .net 6 for most major providers with dotnet/runtime#28633

I'll try to take a look at this when I can get some time but I make no promises.

@KristianWedberg
Copy link
Author

Any help on this issue would be greatly appreciated! As a specific example, I discovered this when benchmarking our ETL library since we use it for (portable) batch inserts. For now, we'll have to advise clients to either switch to the ODBC provider, or use the non-portable bulk insert component. Not the end of the world, but certainly less than ideal.

@cheenamalhotra
Copy link
Member

Hi @KristianWedberg

Thanks for the comparisons. We'll take a look if some improvements can be made.

@cheenamalhotra cheenamalhotra added the 📈 Performance Issues that are targeted to performance improvements. label Mar 9, 2021
@roji
Copy link
Member

roji commented Mar 9, 2021

This is likely to be addressed in .net 6 for most major providers with dotnet/runtime#28633

This is an important point - one of the major incentives for having this new batching API was to remove the need to send so many parameters (which regardless of perf imposed an upper limit, since no more than 2100 parameters can be sent).

@KristianWedberg
Copy link
Author

@roji I'm very much looking forward to the upcoming .NET6 batch API. That said:

  • It's still many years before we can let go of all <.NET6 frameworks
  • The batch API won't help with wide single row inserts

@roji
Copy link
Member

roji commented Mar 9, 2021

It's still many years before we can let go of all <.NET6 frameworks

This is very true, though it's probably going to be possible (and advisable) to expose the batching API even for older TFMs - just without implementing the ADO.NET base classes. This would make the API accessible regardless of which TFM you're using (possibly even .NET Framework); that's my plan for Npgsql/PostgreSQL, in any case.

The batch API won't help with wide single row inserts

This is true - and at that point it's also a question of how common it is to have such a large number of columns etc.

@Wraith2
Copy link
Contributor

Wraith2 commented Mar 9, 2021

This is very true, though it's probably going to be possible (and advisable) to expose the batching API even for older TFMs

The implementation I've got for SqlClient 3.1/5.0 provides versions of the System.Data base classes needed for this so in theory it can be backported to any TFM the library supports.
One thing i hadn't considered is whether multiple drivers would need to live in the same process, I'll give that some thought and see if i can make the support bits internal.

Of course we've still got to do the work to get the feature into 6.0 and then find the time to get it into SqlClient with encryption support, we might want to light a fire under that @roji 😁

@Wraith2
Copy link
Contributor

Wraith2 commented Mar 9, 2021

Just had a quick look. You're using System.Data.SqlClient. Can you try it with Microsoft.Data.SqlClient from this repository and see if the numbers are any better to start with?

@KristianWedberg
Copy link
Author

KristianWedberg commented Mar 10, 2021

@Wraith2 I'm already using both Microsoft.Data.SqlClient (the "Microsoft" or "MicrosoftSync" results) and System.Data.SqlClient ("System" or "SystemSync" results) in the code and in the results. Their performance is identical.

@Wraith2
Copy link
Contributor

Wraith2 commented Mar 10, 2021

Ah, I see it now. Had to do some fiddling to profile it and broke something. My mistake.

I've found some interesting areas that I was already aware of but were a way down my list of things to investigate. It's mostly string composition and array resizing overhead. As the parameter count goes up it spends more time allocating and re-allocating to grow and there's not a lot the caller can do about that. It can be reworked to avoid intermediate strings and look at pre-sizing of string builders and target arrays in a few places.

Two things stand out that you could do for the benchmark. Can you keep pre-boxed ints that are used as parameter values around? Their boxing and unboxing is a big source of GC activity. I know that isn't reflectve of real world data but in real world scenarios if you have strongly typed input data of limited or frequently known ranges (bool, small ints) keeping cached boxed versions of some values around and using those will gain you some perf.

If possible reusing SqlParameter instances would be good. They're quite large and you're using a lot of them.

@KristianWedberg
Copy link
Author

@Wraith2 I've added (and uploaded the code for) using pre-boxed int values to the test.

Here's the original test (boxing of int values, only run 3 batches):

provider RPS CPR NOS Mean Error StdDev Throughput [params/s] Throughput [rows/s]
Microsoft 32 64 3 126.91 ms 9.770 ms 0.536 ms 48411 756
System 32 64 3 126.96 ms 6.023 ms 0.330 ms 48392 756
Odbc 32 64 89 161.43 ms 11.006 ms 0.603 ms 1129092 17642

Instead running with pre-boxed int values makes no difference at all:

provider RPS CPR NOS Mean Error StdDev Throughput [params/s] Throughput [rows/s]
Microsoft 32 64 3 126.8 ms 23.63 ms 1.30 ms 48469 757
System 32 64 3 125.7 ms 20.04 ms 1.10 ms 48861 763
Odbc 32 64 89 156.0 ms 22.80 ms 1.25 ms 1168274 18254

SqlParameter instances are already reused across all batches in a test case. For the slowest test cases this was however only 3 batches. Increasing _scaleNumberOfStatements by a factor of 10 means the cost of SqlParameter creation is instead amortized over 30 batches (instead of 3). This, together with pre-boxed int values, still makes no difference:

provider RPS CPR NOS Mean Error StdDev Throughput [params/s] Throughput [rows/s]
Microsoft 32 64 30 1.260 s 0.1338 s 0.0073 s 48746 762
System 32 64 30 1.259 s 0.1037 s 0.0057 s 48807 763
Odbc 32 64 896 1.549 s 0.1823 s 0.0100 s 1184652 18510

So creating SqlParameters and setting their values is not the bottleneck.

For profiling work, you might set _rowsPerStatements and _columnsPerRows to a single value each and comment out the other providers to get a single test case, and increase _scaleNumberOfStatements sufficiently to get long enough run times.

@Wraith2
Copy link
Contributor

Wraith2 commented Mar 10, 2021

That just means the individual benches aren't running for long enough to be affected by memory issues which is a good thing in general but if you want to run these tasks for a long time preboxing and reusing things will save you some memory. There being no throughput difference isn't surprising I was just making observations. The current limiting factors are parallelism and buffer resizing as I said.

@roji
Copy link
Member

roji commented Mar 10, 2021

@Wraith2

Of course we've still got to do the work to get the feature into 6.0 and then find the time to get it into SqlClient with encryption support, we might want to light a fire under that @roji grin

You're absolutely right... I do have a few other things that need to be done first, but I'll do my best to finally finish this ASAP.

@Wraith2
Copy link
Contributor

Wraith2 commented Mar 12, 2021

I made the code execute about 20% faster and it made no difference to throughput. The code isn't the bottleneck, the network is. ODBC just appears to be more efficient about how the parameters are sent somehow. Since i can't change what needs to be put on the wire or how long it takes to send and receive the messages over the network there is no clear way to improve performance here.

@roji
Copy link
Member

roji commented Mar 12, 2021

@Wraith2 interesting if this is actually a wire protocol data change - it may be worth comparing what they're doing with wireshark. If SqlClient is using TDS in an inefficient way that might be another thing to track somewhere...

@Wraith2
Copy link
Contributor

Wraith2 commented Mar 12, 2021

Sure. One of many many things to investigate at some point.
I'll keep the branch with my changes in around and try to PR them at some point because someone may appreciate the additional speed in some scenario, it just doesn't help in this one.

My advice on this one would be to get closer to the sql server and attempt parallel imports.

@cheenamalhotra
Copy link
Member

You can also try to compare how queries execute on server side by comparing SQL Server Profiler traces. That will tell you whether parameter metadata caching is adding benefit or not on client side.

@KristianWedberg
Copy link
Author

KristianWedberg commented Mar 12, 2021

@Wraith2 When I run this single test case (and with _scaleNumberOfStatements = 40_000, .AddJob(Job.Dry),
again with a local database), I get the following CPU utilization:

provider RPS CPR NOS Mean Throughput [params/s] Client CPU SQL Server CPU
Microsoft 32 64 616 26.39 s 47812 1% 30%
Odbc 32 64 17921 33.12 s 1108118 15% 30%

Certainly not proven, but it looks like:

  • In both cases the SQL Server process is likely bottlenecked on a single thread (I have 4 cores total).
  • The Microsoft provider is (as you said) not a CPU bottleneck, since the client side only consumes 1%.
  • The ODBC test case is performing 29 times more work than the Microsoft test case, so it makes perfect
    sense that it has ~15 times higher client side CPU utilization.

The question is - why does the SQL Server process consume so much CPU when it receives these statements
from the Microsoft.Data.SqlClient provider with 2048 parameters (and starts to be performance limited at 50+ parameters)?

Can this be raised with the SQL Server team?

@Wraith2
Copy link
Contributor

Wraith2 commented Mar 12, 2021

I can provide you with the dev nuget package for my build if you want to try it out and verify the perf difference for yourself.

@KristianWedberg
Copy link
Author

Some more tests:

  • System.Data.SqlClient 4.4.0 (from 2017) and the latest Microsoft.Data.SqlClient 2.1.2 both showed the same issue.
  • SQL Server 2012 and SQL Server 2016 both showed the same issue.

So it's not a recent issue.

@Wraith2
Copy link
Contributor

Wraith2 commented Mar 13, 2021

For shits and giggles you can give this a try and see that it changes nothing if you like.
Microsoft.Data.SqlClient.3.0.0-dev.zip (rename to .nuget)

@KristianWedberg
Copy link
Author

KristianWedberg commented Mar 13, 2021

@Wraith2 Thanks. As expected by now, Microsoft.Data.SqlClient.3.0.0-dev.zip made no difference whatsoever.

For shits and giggles

Lovely image though :-)

@KristianWedberg
Copy link
Author

KristianWedberg commented Mar 13, 2021

Profiling shows that query plans (including operator values) look identical: T-SQL Insert <-- Table Insert <-- Constant Scan

The SqlClient and Odbc queries look identical, except for SqlClient using named instead of positional parameters (here with only 3 columns and 2 rows). The parameters being named might well be the reason for the performance difference.

SqlClient:

exec sp_executesql N'INSERT INTO insertbenchmark_2_3_MicrosoftSync (d0,d1,d2) 
    VALUES (@R0_C0,@R0_C1,@R0_C2),(@R1_C0,@R1_C1,@R1_C2)',N'@R0_C0 int,@R0_C1 int,@R0_C2 int,@R1_C0 int,@R1_C1 int,@R1_C2 int',
    @R0_C0=0,@R0_C1=1,@R0_C2=2,@R1_C0=3,@R1_C1=4,@R1_C2=5

Odbc:

exec sp_executesql N'INSERT INTO insertbenchmark_2_3_OdbcSync      (d0,d1,d2) 
    VALUES (@P1,@P2,@P3),(@P4,@P5,@P6)',N'@P1 int,@P2 int,@P3 int,@P4 int,@P5 int,@P6 int',
    0,1,2,3,4,5

Interestingly, when profiling with many (in this case 32 * 64 = 2048) parameters, using SqlClient the server reports both CPU consumption, physical writes and duration for every query, while using Odbc the server only reports CPU consumption and duration for the first query and no physical writes for any query.

Server statistics per INSERT using SqlClient:

CPU (ms) Logical Reads Physical Writes Duration (ms)
94 96 2 95
47 42 1 40
47 42 1 42
47 42 1 43
46 43 1 43
47 42 1 45
32 42 1 42

Server statistics per INSERT using Odbc:

CPU (ms) Logical Reads Physical Writes Duration (ms)
63 96 0 55
0 42 0 0
0 43 0 0
0 42 0 0
0 42 0 0
0 43 0 0
0 42 0 0

So possibly the named parameters make SQL Server consume significant CPU on every query, whereas with positional parameters
that work is either not needed or a cached copy is used.

I don't know how to troubleshoot this further into SQL Server, do any of you have a handle on that?

@Wraith2
Copy link
Contributor

Wraith2 commented Mar 13, 2021

I think you'll have to open a support case for that.

Once we get to the point where batching is possible would you be interested in trying out the preview builds for it? This is a use case that should be able to benefit from it and it would be useful to have some idea how it will react.

@Wraith2
Copy link
Contributor

Wraith2 commented Mar 16, 2021

exec sp_executesql N'INSERT INTO insertbenchmark_2_3_MicrosoftSync (d0,d1,d2) 
    VALUES (@R0_C0,@R0_C1,@R0_C2),(@R1_C0,@R1_C1,@R1_C2)',N'@R0_C0 int,@R0_C1 int,@R0_C2 int,@R1_C0 int,@R1_C1 int,@R1_C2 int',
    @R0_C0=0,@R0_C1=1,@R0_C2=2,@R1_C0=3,@R1_C1=4,@R1_C2=5

If you just changed the last line of @name=value pairs to a value list without changing the values clauses would it still work? So the question is if the names are important if the names aren't specified in the value list. If it's still valid syntax (indicating @p\d is not special) then I could quirk the BuildParamList to just not output the names if an appcontext flag were set.

@ErikEJ
Copy link
Contributor

ErikEJ commented Mar 16, 2021

@Wraith2 Based on my findings, I think that would be an interesting experiment.

@David-Engel
Copy link
Contributor

For example I'm pretty sure I don't want to change the value of the reset connection header flag because it's important (and yet odbc doesn't use it? should it?)

ODBC would only have it if the driver manager handed out a previously pooled connection and this was the first query to be executed.

@Wraith2
Copy link
Contributor

Wraith2 commented Mar 18, 2021

Both samples were run the same way from the example app provided above. Table setup was done and then I started capturing at a breakpoint and stopped immediately after the query succeeded. the command objects were newed up each time. Looking at a longer running trace the reset behaviour seems to follow what you said. So scratch that one. Next, no metadata?

@cheenamalhotra
Copy link
Member

cheenamalhotra commented Mar 18, 2021

As per MS-TDS Specs, in RPC request, NoMetaData flag is described as:

fNoMetaData
The server sends NoMetaData only if fNoMetadata is set to 1 in the request (see COLMETADATA, section 2.2.7.4)

Whereas (COLMETADATA) where NoMetaData is to be received in response does not apply to RPC Server response, so it's basically a no-op for SqlClient. You can verify response packets too.

It certainly feels confusing. Why this option even exists in RPC request then, any thoughts @v-chojas?

@Wraith2
Copy link
Contributor

Wraith2 commented Mar 18, 2021

Manually decoded packets with annotations, download and diff to review.

The differences are:

  1. ODBC uses PLP string format not inline, this accounts for the length differences
  2. SqlClient sent reset connection flag in the header, already discussed and probably not important
  3. SqlClient sent the NoMetadata flag, this is unconditionally set if the command object is in Text mode.

sqlcient hex.txt
odbc hex.txt

@v-chojas
Copy link

I wouldn't expect to see any difference with 16 parameters, but try looking at one with 512 parameters. That will cause both the SQL and parameter declaration list to be long enough not to fit in a regular nvarchar (4000 chars/8000 bytes). The expression to calculate the length of the parameter list in chars is 8*n + (n-10)*(n>10) + (n-100)*(n>100) + (n-1000)*(n>1000) - 1, which exceeds 4000 when n exceeds 411.

@Wraith2
Copy link
Contributor

Wraith2 commented Mar 18, 2021

What difference are you expecting? It's going to take a long time to manually breakdown a large request which is why I chose to use the short version to identify differences. If there's a particular thing you're looking for here hints would be good.

If the only difference is string representation I doubt that's something that can be fixed in the driver. ODBC is using a more verbose format and is sending smaller packets so it really wouldn't make sense for it to be quicker than sqlclient.

@KristianWedberg
Copy link
Author

Is there any way to profile the SQL Server side to see what it's spending all that CPU time on when statements with many parameters come in (as #974 (comment) shows, that's a very clear difference between SqlClient and Odbc in how the server behaves)?

@v-chojas
Copy link

It should switch over to a different type because regular nvarchar is limited to 4000 chars. If you post hexdumps of the packets (hex+ASCII) I can tell what's happening pretty easily. What type it's using and how it's sending it (PLP chunks can vary in size, but ODBC will send the parameter declaration list in one chunk) would give a clue about the perf difference.

@Wraith2
Copy link
Contributor

Wraith2 commented Mar 18, 2021

sql traces.zip

@v-chojas
Copy link

ODBC sends SQL and parameter declaration list as nvarchar(max):

0b70   37 00 2c 00 40 00 50 00 32 00 30 00 34 00 38 00   7.,.@.P.2.0.4.8.
0b80   29 00 00 00 00 00 00 00*E7 FF FF 09 04 D0 00 00*   )...............
0b90   fe ff ff ff ff ff ff ff 64 04 00 00 40 00 50 00   ........d...@.P.
0ba0   31 00 20 00 69 00 6e 00 74 00 2c 00 40 00 50 00   1. .i.n.t.,.@.P.

SqlClient sends SQL and parameter declaration list as ntext:

1d70   2c 00 40 00 50 00 32 00 30 00 34 00 38 00 29 00   ,.@.P.2.0.4.8.).
1d80   00 00*63 58 A7 00 00 09 04 d0 00 00*58 a7 00 00   ..cX........X...
1d90   40 00 50 00 31 00 20 00 69 00 6e 00 74 00 2c 00   @.P.1. .i.n.t.,.
1da0   40 00 50 00 32 00 20 00 69 00 6e 00 74 00 2c 00   @.P.2. .i.n.t.,.

The ntext is an old datatype that is deprecated, there is probably an internal conversion on the server side which causes the performance difference.

@Wraith2
Copy link
Contributor

Wraith2 commented Mar 18, 2021

I was trying to work out what 63 was and where it was being written in the driver. I'll have to see if I can unknot the spaghetti code around rpc construction enough to identity and possibly change the data type. If I get it working I'll post another build with some appcontext switches to try out.

@Wraith2
Copy link
Contributor

Wraith2 commented Mar 18, 2021

@KristianWedberg ok, give this one a try. All rpc parameters are sent as varchar(length) or varchar(max) depending on the size. netcore only at the moment. There are two appcontext switches you can use the Switch.Microsoft.Data.SqlClient.UseODBCParameterFormat one you've already tried and Switch.Microsoft.Data.SqlClient.OmitNoMetadata because i'd written it in before we hit this string type info. I don't think the metadata flag is going to do anything but I am interested to see how performance reacts to both settings for parameter names and the nvarchar type.

[edit] removed link

@David-Engel
Copy link
Contributor

David-Engel commented Mar 18, 2021

@Wraith2 I got an exception when parameters hit 512. This was with only AppContext.SetSwitch("Switch.Microsoft.Data.SqlClient.UseODBCParameterFormat", true); and with that and AppContext.SetSwitch("Switch.Microsoft.Data.SqlClient.OmitNoMetadata", true);

// **************************
// Benchmark: BatchInsert.InsertAsync: ShortRun(IterationCount=3, LaunchCount=1, WarmupCount=3) [providerAsync=MicrosoftSync, rowsPerStatement=1, columnsPerRow=512, numberOfStatements=67, argument=]
// *** Execute ***
// Launch: 1 / 1
// Execute: dotnet "f7f2e8ad-0b34-455d-8860-54c3eea969e0.dll" --benchmarkName "SqlBatchInsertPerformance.BatchInsert.InsertAsync(providerAsync: MicrosoftSync, rowsPerStatement: 1, columnsPerRow: 512, numberOfStatements: 67, argument: )" --job "ShortRun" --benchmarkId 9 in C:\Users\dengel\Downloads\sql-batch-insert-performance-main\bin\Release\net5.0\f7f2e8ad-0b34-455d-8860-54c3eea969e0\bin\Release\netcoreapp5.0
// BeforeAnythingElse

// Benchmark Process Environment Information:
// Runtime=.NET Core 5.0.4 (CoreCLR 5.0.421.11614, CoreFX 5.0.421.11614), X64 RyuJIT
// GC=Concurrent Workstation
// Job: ShortRun(IterationCount=3, LaunchCount=1, WarmupCount=3)

OverheadJitting  1: 1 op, 366900.00 ns, 366.9000 us/op

System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation.
 ---> Microsoft.Data.SqlClient.SqlException (0x80131904): The incoming tabular data stream (TDS) remote procedure call (RPC) protocol stream is incorrect. Parameter 1 (""): Data type 0xE7 has an invalid data length or metadata length.
   at Microsoft.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
   at Microsoft.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
   at Microsoft.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
   at Microsoft.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)
   at Microsoft.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString, Boolean isInternal, Boolean forDescribeParameterEncryption, Boolean shouldCacheForAlwaysEncrypted)
   at Microsoft.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, Boolean isAsync, Int32 timeout, Task& task, Boolean asyncWrite, Boolean inRetry, SqlDataReader ds, Boolean describeParameterEncryptionRequest)
   at Microsoft.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, TaskCompletionSource`1 completion, Int32 timeout, Task& task, Boolean& usedCache, Boolean asyncWrite, Boolean inRetry, String method)
   at Microsoft.Data.SqlClient.SqlCommand.InternalExecuteNonQuery(TaskCompletionSource`1 completion, Boolean sendToPipe, Int32 timeout, Boolean& usedCache, Boolean asyncWrite, Boolean inRetry, String methodName)
   at Microsoft.Data.SqlClient.SqlCommand.<ExecuteNonQuery>b__143_0()
   at Microsoft.Data.SqlClient.SqlRetryLogicProvider.Execute[TResult](Object sender, Func`1 function)
   at Microsoft.Data.SqlClient.SqlCommand.ExecuteNonQuery()
   at SqlBatchInsertPerformance.BatchInsert.InsertAsync(ProviderAsync providerAsync, Int32 rowsPerStatement, Int32 columnsPerRow, Int32 numberOfStatements, Argument argument) in C:\Users\dengel\Downloads\sql-batch-insert-performance-main\BatchInsert.cs:line 160
   at BenchmarkDotNet.Autogenerated.Runnable_9.WorkloadActionNoUnroll(Int64 invokeCount) in C:\Users\dengel\Downloads\sql-batch-insert-performance-main\bin\Release\net5.0\f7f2e8ad-0b34-455d-8860-54c3eea969e0\f7f2e8ad-0b34-455d-8860-54c3eea969e0.notcs:line 8541
   at BenchmarkDotNet.Engines.Engine.RunIteration(IterationData data)
   at BenchmarkDotNet.Engines.EngineFactory.Jit(Engine engine, Int32 jitIndex, Int32 invokeCount, Int32 unrollFactor)
   at BenchmarkDotNet.Engines.EngineFactory.CreateReadyToRun(EngineParameters engineParameters)
   at BenchmarkDotNet.Autogenerated.Runnable_9.Run(IHost host, String benchmarkName) in C:\Users\dengel\Downloads\sql-batch-insert-performance-main\bin\Release\net5.0\f7f2e8ad-0b34-455d-8860-54c3eea969e0\f7f2e8ad-0b34-455d-8860-54c3eea969e0.notcs:line 7763
ClientConnectionId:d8de5db4-60d1-4b00-b365-92a0041914e8
Error Number:8016,State:30,Class:16
   --- End of inner exception stack trace ---
   at System.RuntimeMethodHandle.InvokeMethod(Object target, Object[] arguments, Signature sig, Boolean constructor, Boolean wrapExceptions)
   at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)
   at System.Reflection.MethodBase.Invoke(Object obj, Object[] parameters)
   at BenchmarkDotNet.Autogenerated.UniqueProgramName.AfterAssemblyLoadingAttached(String[] args) in C:\Users\dengel\Downloads\sql-batch-insert-performance-main\bin\Release\net5.0\f7f2e8ad-0b34-455d-8860-54c3eea969e0\f7f2e8ad-0b34-455d-8860-54c3eea969e0.notcs:line 57
// AfterAll
ExitCode != 0
// Benchmark Process 35124 has exited with code -1

@Wraith2
Copy link
Contributor

Wraith2 commented Mar 18, 2021

Interesting. So how do you specify an unbounded nvarchar?
I thought this this:

            sqlParam.SqlDbType = ((paramList.Length << 1) <= TdsEnums.TYPE_SIZE_LIMIT) ? SqlDbType.NVarChar : SqlDbType.NText;
            sqlParam.Value = paramList;
            sqlParam.Size = paramList.Length;

would become this:

            sqlParam.SqlDbType = SqlDbType.NVarChar;
            sqlParam.Value = paramList;
            sqlParam.Size = ((paramList.Length << 1) <= TdsEnums.TYPE_SIZE_LIMIT) ? paramList.Length : -1;

[edit]
perhaps you can only send varchar(max) in plp mode?

@David-Engel
Copy link
Contributor

perhaps you can only send varchar(max) in plp mode?

@Wraith2
I think that's correct and it looks like that's what the JDBC driver does:
https://github.com/microsoft/mssql-jdbc/blob/dev/src/main/java/com/microsoft/sqlserver/jdbc/IOBuffer.java#L4615

@KristianWedberg
Copy link
Author

Now with the latest MDS 3.0.0-dev, and 256 parameters.

  • OmitNoMetadata makes no difference, with or without UseODBCParameterFormat.
  • The latest varchar version provides no additional improvement at 256 parameters (379,143 vs. the old 374,091 parameters/s with UseODBCParameterFormat=true).

This should obviously be tested with >256 parameters (I got the exception to), which showed a large slowdown with the non-varchar version.

Because of the higher performance, I'll start running with a larger total volume, so these throughput numbers are not directly comparable with previous tests. Again, the interesting bit is if >256 parametes will slow down or not.

UseODBCParameterFormat provider RPS CPR Throughput [params/s]
Odbc 16 16 557,563
true Microsoft 3.0.0-dev 16 16 625,339
false Microsoft 3.0.0-dev 16 16 218,018
System 16 16 215,035

@Wraith2
Copy link
Contributor

Wraith2 commented Mar 18, 2021

I'm trying it but it isn't going well. The MetaType for varchar isn't PLP so i'm going to end up looking for size==-1 all over the place and even then it's going to touch a whole lot of places i'm really not happy about possibly breaking. I can only do the sync version because the async code in TDSExecuteRPCAddParameter means you can't add any data after you've written the parameter so i don't know how to get the PLP chunk terminator written.

@Wraith2
Copy link
Contributor

Wraith2 commented Mar 18, 2021

@KristianWedberg So am I reading the results right that the combination of omitting parameter names and using nvarchar gives better performance than ODBC?

@KristianWedberg
Copy link
Author

KristianWedberg commented Mar 18, 2021

MDS 3.0.0-dev + Omitting parameter names is consistently 5-10% faster than Odbc at 256 parameters.

nvarchar might or might not help a few percent, hard to say since I can't test them both at the same time.

Again - nvarchar might help greatly >256 parameters, that we don't know yet.

@Wraith2
Copy link
Contributor

Wraith2 commented Mar 18, 2021

You can test longer with this. Horrible hackjob, absolutely don't use it for anything other than this testing and only in sync mode or it'll crash like before.
Microsoft.Data.SqlClient.3.0.0-dev.zip

@KristianWedberg
Copy link
Author

Success! 🎆

A quick test shows that the latest MDS (and with both switches enabled) scales well all the way to 2048 parameters (at 1,184,713 parameters/s), staying 5-10% faster than Odbc all the way! Full results will take longer.

Note: I did realize I had one issue in my testing setup until now: I was still giving MDS less work (since it used to be up to 20x slower). It might not have been the latest change that did the trick, it could e.g. be just the UseODBCParameterFormat switch, I'll go back and check this now.

@Wraith2
Copy link
Contributor

Wraith2 commented Mar 18, 2021

OK. Well consider that a proof of concept and now we know it's possible to make it work it'll need to be worked on to make it production capable. I need to learn how the driver is currently handling varchar(max) because it's clearly using some trickery if I've had to add workarounds to make it send a long string correctly. This could take some time.

@KristianWedberg
Copy link
Author

KristianWedberg commented Mar 19, 2021

To sum up the performance measurements:

  • There is no performance difference between the latest MDS nvarchar hackjob and the earlier MDS UseODBCParameterFormat without nvarchar (now that I'm running with the same data volume), i.e. the nvarchar hackjob changes are not needed to get great performance.
  • With the latest MDS nvarchar hackjob, enabling/disabling OmitNoMetadata makes no difference, from 1 to 2048 parameters.
  • Disabling UseODBCParameterFormat makes MDS slow down at >32 parameters.

Here are repeatable results using the earlier UseODBCParameterFormat MDS (i.e. without nvarchar changes):

UseODBCParameterFormat provider RPS CPR Parameters Throughput [params/s]
System 32 8 256 222,754
System 32 16 512 157,327
System 32 32 1024 90,484
System 32 64 2048 44,815
false Microsoft 4 8 32 131,051
false Microsoft 8 8 64 211,954
false Microsoft 16 8 128 227,015
false Microsoft 32 8 256 230,582
false Microsoft 32 16 512 158,538
false Microsoft 32 32 1024 92,574
false Microsoft 32 64 2048 49,096
true Microsoft 4 8 32 141,485
true Microsoft 8 8 64 252,613
true Microsoft 16 8 128 422,281
true Microsoft 32 8 256 600,829
true Microsoft 32 16 512 801,047
true Microsoft 32 32 1024 1,080,539
true Microsoft 32 64 2048 1,239,986
Odbc 4 8 32 133,329
Odbc 8 8 64 239,711
Odbc 16 8 128 378,298
Odbc 32 8 256 549,672
Odbc 32 16 512 755,040
Odbc 32 32 1024 984,802
Odbc 32 64 2048 1,120,003

So performance looks great.

Functionality wise, I'll highlight that my benchmark only uses each parameter once, in the same order they're written in the query. I'm guessing reusing a named parameter multiple times in the query requires additional testing.

[Edited]

  • Disabling UseODBCParameterFormat makes MDS slow down at >32 parameters (not >256)
  • Added test cases with <256 parameters

@Wraith2
Copy link
Contributor

Wraith2 commented Mar 19, 2021

• There is no performance difference between the latest MDS nvarchar hackjob and the earlier MDS UseODBCParameterFormat without nvarchar (now that I'm running with the same data volume), i.e. the nvarchar hackjob changes are not needed to get great performance.

Oh, that's surprising but welcome.
The parameter format is pretty easy it just needs some agreement on how to implement it so you can use it on a per-command basis. I listed 3 strategies earlier in the thread. That'll be down to @David-Engel and @cheenamalhotra

I'm guessing reusing a named parameter multiple times in the query requires additional testing.

yes, definitely. I expect it not to work.

Disabling UseODBCParameterFormat makes MDS slow down at >256 parameters.

That's got to be down to how the name mapping is implemented on the server. I doubt anyone expected it to reach the limit of parameters, that might be something @v-chojas could explore. As I think I said earlier the falloff curve looks list iteration instead of map lookup which makes sense because list tends to be faster than map when items<~100.

@David-Engel
Copy link
Contributor

I'm not able to replicate the improved test results with the latest MDS package (hackjob) from @Wraith2 or the previous package. I've triple checked my MDS DLL and I'm pretty sure I'm using the correct one (modified dates line up to the correct times in my build output) and I've tried with and without the UseODBCParameterFormat switch. Performance still falls off around 256 params on all runs I try. 🤔

@KristianWedberg Can you push any changes you made to the test to your repo?

@KristianWedberg
Copy link
Author

@David-Engel Yes, that will be due to running with too small a data volume for the fast MDS. I've refreshed the benchmark code to address that (for me each test case now takes about 1 second). You'll have to upgrade from MDS 2.1.2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
📈 Performance Issues that are targeted to performance improvements.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants