Poor INSERT performance with ExecuteNonQuery() and many parameters #974

KristianWedberg · 2021-03-09T16:24:38Z

Description

When inserting rows with ExecuteNonQuery() and placeholder parameters, performance
(measured as inserted parameters per second) with the Microsoft.Data.SqlClient v2.1.1
and System.Data.SqlClient providers is progressively impacted from about 50+ parameters
per statement, both for single rows and for multirow batches.

Inserting with 2048 parameters in the statement is up to 7 times slower than with
128 parameters.

The issue is not present when using the System.Data.Odbc provider (or non-SQL Server
providers & databases), and performance is up to 25 times slower (when using 2048 parameters
per statement) than with the ODBC provider.

I've tested with Windows 10, .NET5.0 and a local SQL Server 2019 database, varying the
number of columns from 1 to 1024, and number of rows in each batch from 1 to 1000.

Note: The 'waviness' at 1000 and 2000 parameters is just an effect of the 1000/1024 and 2000/2048
numbers being separated in the graph - see the write-up for details:

The target table is un-indexed and the inserts uses SQL-92 syntax:

INSERT INTO tablename (column-a, [column-b, ...])
VALUES (@r0_c0, [@r0_c1, ...]),
       (@r1_c0, [@r1_c1, ...]),
       ...

While table valued parameters and bulk inserts don't have this issue, they won't
help when inserting a single or just a few very wide rows at a time, so it would be
really useful to have this addressed (or identify any gremlins on my part!)

Please see the full write-up with charts, tables, BenchmarkDotNet info etc. at:

https://github.com/KristianWedberg/sql-batch-insert-performance

Reproduce

Fully runnable source (using BenchmarkDotNet) and instructions at:

https://github.com/KristianWedberg/sql-batch-insert-performance

Expected behavior

Using more parameters (beyond 128) in an insert row or batch should increase throughput,
measured as the number of parameters inserted per second, just like it does with
System.Data.Odbc and other non-SqlClient providers.

Technical details

Microsoft.Data.SqlClient Version="2.1.1"
System.Data.Odbc Version="5.0.0"
System.Data.SqlClient Version="4.8.2"
OS=Windows 10.0.19042
.NET Core SDK=5.0.103

The text was updated successfully, but these errors were encountered:

Wraith2 · 2021-03-09T16:30:01Z

What is the real world use case for using so many parameters?

KristianWedberg · 2021-03-09T16:41:11Z

I've personally come across this with:

Wide rows to capture different stages of a process (think lots of dates and associated data). One real estate company used 450 columns for this.
- 'Type 3 slowly changing dimensions' is a similar use case
Wide rows for denormalized data marts
Multi-row batches to improve performance
- While table valued parameters and bulk inserts can handle this use case for SQL Server, those techniques are not portable, which is a big draw-back

Wraith2 · 2021-03-09T17:05:41Z

Multi-row batches to improve performance
While table valued parameters and bulk inserts can handle this use case for SQL Server, those techniques are not portable, which is a big draw-back

This is likely to be addressed in .net 6 for most major providers with dotnet/runtime#28633

I'll try to take a look at this when I can get some time but I make no promises.

KristianWedberg · 2021-03-09T17:36:06Z

Any help on this issue would be greatly appreciated! As a specific example, I discovered this when benchmarking our ETL library since we use it for (portable) batch inserts. For now, we'll have to advise clients to either switch to the ODBC provider, or use the non-portable bulk insert component. Not the end of the world, but certainly less than ideal.

cheenamalhotra · 2021-03-09T17:58:53Z

Hi @KristianWedberg

Thanks for the comparisons. We'll take a look if some improvements can be made.

roji · 2021-03-09T20:39:11Z

This is likely to be addressed in .net 6 for most major providers with dotnet/runtime#28633

This is an important point - one of the major incentives for having this new batching API was to remove the need to send so many parameters (which regardless of perf imposed an upper limit, since no more than 2100 parameters can be sent).

KristianWedberg · 2021-03-09T20:59:35Z

@roji I'm very much looking forward to the upcoming .NET6 batch API. That said:

It's still many years before we can let go of all <.NET6 frameworks
The batch API won't help with wide single row inserts

roji · 2021-03-09T21:22:57Z

It's still many years before we can let go of all <.NET6 frameworks

This is very true, though it's probably going to be possible (and advisable) to expose the batching API even for older TFMs - just without implementing the ADO.NET base classes. This would make the API accessible regardless of which TFM you're using (possibly even .NET Framework); that's my plan for Npgsql/PostgreSQL, in any case.

The batch API won't help with wide single row inserts

This is true - and at that point it's also a question of how common it is to have such a large number of columns etc.

Wraith2 · 2021-03-09T21:53:11Z

This is very true, though it's probably going to be possible (and advisable) to expose the batching API even for older TFMs

The implementation I've got for SqlClient 3.1/5.0 provides versions of the System.Data base classes needed for this so in theory it can be backported to any TFM the library supports.
One thing i hadn't considered is whether multiple drivers would need to live in the same process, I'll give that some thought and see if i can make the support bits internal.

Of course we've still got to do the work to get the feature into 6.0 and then find the time to get it into SqlClient with encryption support, we might want to light a fire under that @roji 😁

Wraith2 · 2021-03-09T23:49:54Z

Just had a quick look. You're using System.Data.SqlClient. Can you try it with Microsoft.Data.SqlClient from this repository and see if the numbers are any better to start with?

KristianWedberg · 2021-03-10T00:03:48Z

@Wraith2 I'm already using both Microsoft.Data.SqlClient (the "Microsoft" or "MicrosoftSync" results) and System.Data.SqlClient ("System" or "SystemSync" results) in the code and in the results. Their performance is identical.

Wraith2 · 2021-03-10T00:51:08Z

Ah, I see it now. Had to do some fiddling to profile it and broke something. My mistake.

I've found some interesting areas that I was already aware of but were a way down my list of things to investigate. It's mostly string composition and array resizing overhead. As the parameter count goes up it spends more time allocating and re-allocating to grow and there's not a lot the caller can do about that. It can be reworked to avoid intermediate strings and look at pre-sizing of string builders and target arrays in a few places.

Two things stand out that you could do for the benchmark. Can you keep pre-boxed ints that are used as parameter values around? Their boxing and unboxing is a big source of GC activity. I know that isn't reflectve of real world data but in real world scenarios if you have strongly typed input data of limited or frequently known ranges (bool, small ints) keeping cached boxed versions of some values around and using those will gain you some perf.

If possible reusing SqlParameter instances would be good. They're quite large and you're using a lot of them.

KristianWedberg · 2021-03-10T02:28:53Z

@Wraith2 I've added (and uploaded the code for) using pre-boxed int values to the test.

Here's the original test (boxing of int values, only run 3 batches):

provider	RPS	CPR	NOS	Mean	Error	StdDev	Throughput [params/s]	Throughput [rows/s]
Microsoft	32	64	3	126.91 ms	9.770 ms	0.536 ms	48411	756
System	32	64	3	126.96 ms	6.023 ms	0.330 ms	48392	756
Odbc	32	64	89	161.43 ms	11.006 ms	0.603 ms	1129092	17642

Instead running with pre-boxed int values makes no difference at all:

provider	RPS	CPR	NOS	Mean	Error	StdDev	Throughput [params/s]	Throughput [rows/s]
Microsoft	32	64	3	126.8 ms	23.63 ms	1.30 ms	48469	757
System	32	64	3	125.7 ms	20.04 ms	1.10 ms	48861	763
Odbc	32	64	89	156.0 ms	22.80 ms	1.25 ms	1168274	18254

SqlParameter instances are already reused across all batches in a test case. For the slowest test cases this was however only 3 batches. Increasing _scaleNumberOfStatements by a factor of 10 means the cost of SqlParameter creation is instead amortized over 30 batches (instead of 3). This, together with pre-boxed int values, still makes no difference:

provider	RPS	CPR	NOS	Mean	Error	StdDev	Throughput [params/s]	Throughput [rows/s]
Microsoft	32	64	30	1.260 s	0.1338 s	0.0073 s	48746	762
System	32	64	30	1.259 s	0.1037 s	0.0057 s	48807	763
Odbc	32	64	896	1.549 s	0.1823 s	0.0100 s	1184652	18510

So creating SqlParameters and setting their values is not the bottleneck.

For profiling work, you might set _rowsPerStatements and _columnsPerRows to a single value each and comment out the other providers to get a single test case, and increase _scaleNumberOfStatements sufficiently to get long enough run times.

Wraith2 · 2021-03-10T09:20:14Z

That just means the individual benches aren't running for long enough to be affected by memory issues which is a good thing in general but if you want to run these tasks for a long time preboxing and reusing things will save you some memory. There being no throughput difference isn't surprising I was just making observations. The current limiting factors are parallelism and buffer resizing as I said.

roji · 2021-03-10T21:33:43Z

@Wraith2

Of course we've still got to do the work to get the feature into 6.0 and then find the time to get it into SqlClient with encryption support, we might want to light a fire under that @roji grin

You're absolutely right... I do have a few other things that need to be done first, but I'll do my best to finally finish this ASAP.

Wraith2 · 2021-03-12T16:34:18Z

I made the code execute about 20% faster and it made no difference to throughput. The code isn't the bottleneck, the network is. ODBC just appears to be more efficient about how the parameters are sent somehow. Since i can't change what needs to be put on the wire or how long it takes to send and receive the messages over the network there is no clear way to improve performance here.

roji · 2021-03-12T17:25:42Z

@Wraith2 interesting if this is actually a wire protocol data change - it may be worth comparing what they're doing with wireshark. If SqlClient is using TDS in an inefficient way that might be another thing to track somewhere...

Wraith2 · 2021-03-12T17:50:58Z

Sure. One of many many things to investigate at some point.
I'll keep the branch with my changes in around and try to PR them at some point because someone may appreciate the additional speed in some scenario, it just doesn't help in this one.

My advice on this one would be to get closer to the sql server and attempt parallel imports.

cheenamalhotra · 2021-03-12T18:52:52Z

You can also try to compare how queries execute on server side by comparing SQL Server Profiler traces. That will tell you whether parameter metadata caching is adding benefit or not on client side.

KristianWedberg · 2021-03-12T18:54:05Z

@Wraith2 When I run this single test case (and with _scaleNumberOfStatements = 40_000, .AddJob(Job.Dry),
again with a local database), I get the following CPU utilization:

provider	RPS	CPR	NOS	Mean	Throughput [params/s]	Client CPU	SQL Server CPU
Microsoft	32	64	616	26.39 s	47812	1%	30%
Odbc	32	64	17921	33.12 s	1108118	15%	30%

Certainly not proven, but it looks like:

In both cases the SQL Server process is likely bottlenecked on a single thread (I have 4 cores total).
The Microsoft provider is (as you said) not a CPU bottleneck, since the client side only consumes 1%.
The ODBC test case is performing 29 times more work than the Microsoft test case, so it makes perfect
sense that it has ~15 times higher client side CPU utilization.

The question is - why does the SQL Server process consume so much CPU when it receives these statements
from the Microsoft.Data.SqlClient provider with 2048 parameters (and starts to be performance limited at 50+ parameters)?

Can this be raised with the SQL Server team?

Wraith2 · 2021-03-12T19:05:17Z

I can provide you with the dev nuget package for my build if you want to try it out and verify the perf difference for yourself.

KristianWedberg · 2021-03-13T18:40:33Z

Some more tests:

System.Data.SqlClient 4.4.0 (from 2017) and the latest Microsoft.Data.SqlClient 2.1.2 both showed the same issue.
SQL Server 2012 and SQL Server 2016 both showed the same issue.

So it's not a recent issue.

Wraith2 · 2021-03-13T19:19:23Z

For shits and giggles you can give this a try and see that it changes nothing if you like.
Microsoft.Data.SqlClient.3.0.0-dev.zip (rename to .nuget)

KristianWedberg · 2021-03-13T20:05:07Z

@Wraith2 Thanks. As expected by now, Microsoft.Data.SqlClient.3.0.0-dev.zip made no difference whatsoever.

For shits and giggles

Lovely image though :-)

KristianWedberg · 2021-03-13T20:11:10Z

Profiling shows that query plans (including operator values) look identical: T-SQL Insert <-- Table Insert <-- Constant Scan

The SqlClient and Odbc queries look identical, except for SqlClient using named instead of positional parameters (here with only 3 columns and 2 rows). The parameters being named might well be the reason for the performance difference.

SqlClient:

exec sp_executesql N'INSERT INTO insertbenchmark_2_3_MicrosoftSync (d0,d1,d2) 
    VALUES (@R0_C0,@R0_C1,@R0_C2),(@R1_C0,@R1_C1,@R1_C2)',N'@R0_C0 int,@R0_C1 int,@R0_C2 int,@R1_C0 int,@R1_C1 int,@R1_C2 int',
    @R0_C0=0,@R0_C1=1,@R0_C2=2,@R1_C0=3,@R1_C1=4,@R1_C2=5

Odbc:

exec sp_executesql N'INSERT INTO insertbenchmark_2_3_OdbcSync      (d0,d1,d2) 
    VALUES (@P1,@P2,@P3),(@P4,@P5,@P6)',N'@P1 int,@P2 int,@P3 int,@P4 int,@P5 int,@P6 int',
    0,1,2,3,4,5

Interestingly, when profiling with many (in this case 32 * 64 = 2048) parameters, using SqlClient the server reports both CPU consumption, physical writes and duration for every query, while using Odbc the server only reports CPU consumption and duration for the first query and no physical writes for any query.

Server statistics per INSERT using SqlClient:

CPU (ms)	Logical Reads	Physical Writes	Duration (ms)
94	96	2	95
47	42	1	40
47	42	1	42
47	42	1	43
46	43	1	43
47	42	1	45
32	42	1	42

Server statistics per INSERT using Odbc:

CPU (ms)	Logical Reads	Duration (ms)
63	96	55
0	42	0
0	43	0
0	42	0
0	42	0
0	43	0
0	42	0

So possibly the named parameters make SQL Server consume significant CPU on every query, whereas with positional parameters
that work is either not needed or a cached copy is used.

I don't know how to troubleshoot this further into SQL Server, do any of you have a handle on that?

Wraith2 · 2021-03-13T22:16:45Z

I think you'll have to open a support case for that.

Once we get to the point where batching is possible would you be interested in trying out the preview builds for it? This is a use case that should be able to benefit from it and it would be useful to have some idea how it will react.

Wraith2 · 2021-03-16T14:52:40Z

exec sp_executesql N'INSERT INTO insertbenchmark_2_3_MicrosoftSync (d0,d1,d2) 
    VALUES (@R0_C0,@R0_C1,@R0_C2),(@R1_C0,@R1_C1,@R1_C2)',N'@R0_C0 int,@R0_C1 int,@R0_C2 int,@R1_C0 int,@R1_C1 int,@R1_C2 int',
    @R0_C0=0,@R0_C1=1,@R0_C2=2,@R1_C0=3,@R1_C1=4,@R1_C2=5

If you just changed the last line of @name=value pairs to a value list without changing the values clauses would it still work? So the question is if the names are important if the names aren't specified in the value list. If it's still valid syntax (indicating @p\d is not special) then I could quirk the BuildParamList to just not output the names if an appcontext flag were set.

ErikEJ · 2021-03-16T14:55:27Z

@Wraith2 Based on my findings, I think that would be an interesting experiment.

David-Engel · 2021-03-18T00:15:08Z

For example I'm pretty sure I don't want to change the value of the reset connection header flag because it's important (and yet odbc doesn't use it? should it?)

ODBC would only have it if the driver manager handed out a previously pooled connection and this was the first query to be executed.

Wraith2 · 2021-03-18T00:28:43Z

Both samples were run the same way from the example app provided above. Table setup was done and then I started capturing at a breakpoint and stopped immediately after the query succeeded. the command objects were newed up each time. Looking at a longer running trace the reset behaviour seems to follow what you said. So scratch that one. Next, no metadata?

cheenamalhotra · 2021-03-18T06:56:59Z

As per MS-TDS Specs, in RPC request, NoMetaData flag is described as:

fNoMetaData
The server sends NoMetaData only if fNoMetadata is set to 1 in the request (see COLMETADATA, section 2.2.7.4)

Whereas (COLMETADATA) where NoMetaData is to be received in response does not apply to RPC Server response, so it's basically a no-op for SqlClient. You can verify response packets too.

It certainly feels confusing. Why this option even exists in RPC request then, any thoughts @v-chojas?

Wraith2 · 2021-03-18T12:23:21Z

Manually decoded packets with annotations, download and diff to review.

The differences are:

ODBC uses PLP string format not inline, this accounts for the length differences
SqlClient sent reset connection flag in the header, already discussed and probably not important
SqlClient sent the NoMetadata flag, this is unconditionally set if the command object is in Text mode.

sqlcient hex.txt
odbc hex.txt

v-chojas · 2021-03-18T15:32:30Z

I wouldn't expect to see any difference with 16 parameters, but try looking at one with 512 parameters. That will cause both the SQL and parameter declaration list to be long enough not to fit in a regular nvarchar (4000 chars/8000 bytes). The expression to calculate the length of the parameter list in chars is 8*n + (n-10)*(n>10) + (n-100)*(n>100) + (n-1000)*(n>1000) - 1, which exceeds 4000 when n exceeds 411.

Wraith2 · 2021-03-18T15:44:42Z

What difference are you expecting? It's going to take a long time to manually breakdown a large request which is why I chose to use the short version to identify differences. If there's a particular thing you're looking for here hints would be good.

If the only difference is string representation I doubt that's something that can be fixed in the driver. ODBC is using a more verbose format and is sending smaller packets so it really wouldn't make sense for it to be quicker than sqlclient.

KristianWedberg · 2021-03-18T15:51:44Z

Is there any way to profile the SQL Server side to see what it's spending all that CPU time on when statements with many parameters come in (as #974 (comment) shows, that's a very clear difference between SqlClient and Odbc in how the server behaves)?

v-chojas · 2021-03-18T15:53:07Z

It should switch over to a different type because regular nvarchar is limited to 4000 chars. If you post hexdumps of the packets (hex+ASCII) I can tell what's happening pretty easily. What type it's using and how it's sending it (PLP chunks can vary in size, but ODBC will send the parameter declaration list in one chunk) would give a clue about the perf difference.

Wraith2 · 2021-03-18T17:31:30Z

sql traces.zip

v-chojas · 2021-03-18T17:48:11Z

ODBC sends SQL and parameter declaration list as nvarchar(max):

0b70   37 00 2c 00 40 00 50 00 32 00 30 00 34 00 38 00   7.,.@.P.2.0.4.8.
0b80   29 00 00 00 00 00 00 00*E7 FF FF 09 04 D0 00 00*   )...............
0b90   fe ff ff ff ff ff ff ff 64 04 00 00 40 00 50 00   ........d...@.P.
0ba0   31 00 20 00 69 00 6e 00 74 00 2c 00 40 00 50 00   1. .i.n.t.,.@.P.

SqlClient sends SQL and parameter declaration list as ntext:

1d70   2c 00 40 00 50 00 32 00 30 00 34 00 38 00 29 00   ,.@.P.2.0.4.8.).
1d80   00 00*63 58 A7 00 00 09 04 d0 00 00*58 a7 00 00   ..cX........X...
1d90   40 00 50 00 31 00 20 00 69 00 6e 00 74 00 2c 00   @.P.1. .i.n.t.,.
1da0   40 00 50 00 32 00 20 00 69 00 6e 00 74 00 2c 00   @.P.2. .i.n.t.,.

The ntext is an old datatype that is deprecated, there is probably an internal conversion on the server side which causes the performance difference.

Wraith2 · 2021-03-18T18:00:57Z

I was trying to work out what 63 was and where it was being written in the driver. I'll have to see if I can unknot the spaghetti code around rpc construction enough to identity and possibly change the data type. If I get it working I'll post another build with some appcontext switches to try out.

Wraith2 · 2021-03-18T19:37:01Z

@KristianWedberg ok, give this one a try. All rpc parameters are sent as varchar(length) or varchar(max) depending on the size. netcore only at the moment. There are two appcontext switches you can use the Switch.Microsoft.Data.SqlClient.UseODBCParameterFormat one you've already tried and Switch.Microsoft.Data.SqlClient.OmitNoMetadata because i'd written it in before we hit this string type info. I don't think the metadata flag is going to do anything but I am interested to see how performance reacts to both settings for parameter names and the nvarchar type.

[edit] removed link

David-Engel · 2021-03-18T19:55:33Z

@Wraith2 I got an exception when parameters hit 512. This was with only AppContext.SetSwitch("Switch.Microsoft.Data.SqlClient.UseODBCParameterFormat", true); and with that and AppContext.SetSwitch("Switch.Microsoft.Data.SqlClient.OmitNoMetadata", true);

// **************************
// Benchmark: BatchInsert.InsertAsync: ShortRun(IterationCount=3, LaunchCount=1, WarmupCount=3) [providerAsync=MicrosoftSync, rowsPerStatement=1, columnsPerRow=512, numberOfStatements=67, argument=]
// *** Execute ***
// Launch: 1 / 1
// Execute: dotnet "f7f2e8ad-0b34-455d-8860-54c3eea969e0.dll" --benchmarkName "SqlBatchInsertPerformance.BatchInsert.InsertAsync(providerAsync: MicrosoftSync, rowsPerStatement: 1, columnsPerRow: 512, numberOfStatements: 67, argument: )" --job "ShortRun" --benchmarkId 9 in C:\Users\dengel\Downloads\sql-batch-insert-performance-main\bin\Release\net5.0\f7f2e8ad-0b34-455d-8860-54c3eea969e0\bin\Release\netcoreapp5.0
// BeforeAnythingElse

// Benchmark Process Environment Information:
// Runtime=.NET Core 5.0.4 (CoreCLR 5.0.421.11614, CoreFX 5.0.421.11614), X64 RyuJIT
// GC=Concurrent Workstation
// Job: ShortRun(IterationCount=3, LaunchCount=1, WarmupCount=3)

OverheadJitting  1: 1 op, 366900.00 ns, 366.9000 us/op

System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation.
 ---> Microsoft.Data.SqlClient.SqlException (0x80131904): The incoming tabular data stream (TDS) remote procedure call (RPC) protocol stream is incorrect. Parameter 1 (""): Data type 0xE7 has an invalid data length or metadata length.
   at Microsoft.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
   at Microsoft.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
   at Microsoft.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
   at Microsoft.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)
   at Microsoft.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString, Boolean isInternal, Boolean forDescribeParameterEncryption, Boolean shouldCacheForAlwaysEncrypted)
   at Microsoft.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, Boolean isAsync, Int32 timeout, Task& task, Boolean asyncWrite, Boolean inRetry, SqlDataReader ds, Boolean describeParameterEncryptionRequest)
   at Microsoft.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, TaskCompletionSource`1 completion, Int32 timeout, Task& task, Boolean& usedCache, Boolean asyncWrite, Boolean inRetry, String method)
   at Microsoft.Data.SqlClient.SqlCommand.InternalExecuteNonQuery(TaskCompletionSource`1 completion, Boolean sendToPipe, Int32 timeout, Boolean& usedCache, Boolean asyncWrite, Boolean inRetry, String methodName)
   at Microsoft.Data.SqlClient.SqlCommand.<ExecuteNonQuery>b__143_0()
   at Microsoft.Data.SqlClient.SqlRetryLogicProvider.Execute[TResult](Object sender, Func`1 function)
   at Microsoft.Data.SqlClient.SqlCommand.ExecuteNonQuery()
   at SqlBatchInsertPerformance.BatchInsert.InsertAsync(ProviderAsync providerAsync, Int32 rowsPerStatement, Int32 columnsPerRow, Int32 numberOfStatements, Argument argument) in C:\Users\dengel\Downloads\sql-batch-insert-performance-main\BatchInsert.cs:line 160
   at BenchmarkDotNet.Autogenerated.Runnable_9.WorkloadActionNoUnroll(Int64 invokeCount) in C:\Users\dengel\Downloads\sql-batch-insert-performance-main\bin\Release\net5.0\f7f2e8ad-0b34-455d-8860-54c3eea969e0\f7f2e8ad-0b34-455d-8860-54c3eea969e0.notcs:line 8541
   at BenchmarkDotNet.Engines.Engine.RunIteration(IterationData data)
   at BenchmarkDotNet.Engines.EngineFactory.Jit(Engine engine, Int32 jitIndex, Int32 invokeCount, Int32 unrollFactor)
   at BenchmarkDotNet.Engines.EngineFactory.CreateReadyToRun(EngineParameters engineParameters)
   at BenchmarkDotNet.Autogenerated.Runnable_9.Run(IHost host, String benchmarkName) in C:\Users\dengel\Downloads\sql-batch-insert-performance-main\bin\Release\net5.0\f7f2e8ad-0b34-455d-8860-54c3eea969e0\f7f2e8ad-0b34-455d-8860-54c3eea969e0.notcs:line 7763
ClientConnectionId:d8de5db4-60d1-4b00-b365-92a0041914e8
Error Number:8016,State:30,Class:16
   --- End of inner exception stack trace ---
   at System.RuntimeMethodHandle.InvokeMethod(Object target, Object[] arguments, Signature sig, Boolean constructor, Boolean wrapExceptions)
   at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)
   at System.Reflection.MethodBase.Invoke(Object obj, Object[] parameters)
   at BenchmarkDotNet.Autogenerated.UniqueProgramName.AfterAssemblyLoadingAttached(String[] args) in C:\Users\dengel\Downloads\sql-batch-insert-performance-main\bin\Release\net5.0\f7f2e8ad-0b34-455d-8860-54c3eea969e0\f7f2e8ad-0b34-455d-8860-54c3eea969e0.notcs:line 57
// AfterAll
ExitCode != 0
// Benchmark Process 35124 has exited with code -1

Wraith2 · 2021-03-18T20:13:11Z

Interesting. So how do you specify an unbounded nvarchar?
I thought this this:

            sqlParam.SqlDbType = ((paramList.Length << 1) <= TdsEnums.TYPE_SIZE_LIMIT) ? SqlDbType.NVarChar : SqlDbType.NText;
            sqlParam.Value = paramList;
            sqlParam.Size = paramList.Length;

would become this:

            sqlParam.SqlDbType = SqlDbType.NVarChar;
            sqlParam.Value = paramList;
            sqlParam.Size = ((paramList.Length << 1) <= TdsEnums.TYPE_SIZE_LIMIT) ? paramList.Length : -1;

[edit]
perhaps you can only send varchar(max) in plp mode?

David-Engel · 2021-03-18T21:03:44Z

perhaps you can only send varchar(max) in plp mode?

@Wraith2
I think that's correct and it looks like that's what the JDBC driver does:
https://github.com/microsoft/mssql-jdbc/blob/dev/src/main/java/com/microsoft/sqlserver/jdbc/IOBuffer.java#L4615

KristianWedberg · 2021-03-18T21:14:07Z

Now with the latest MDS 3.0.0-dev, and 256 parameters.

OmitNoMetadata makes no difference, with or without UseODBCParameterFormat.
The latest varchar version provides no additional improvement at 256 parameters (379,143 vs. the old 374,091 parameters/s with UseODBCParameterFormat=true).

This should obviously be tested with >256 parameters (I got the exception to), which showed a large slowdown with the non-varchar version.

Because of the higher performance, I'll start running with a larger total volume, so these throughput numbers are not directly comparable with previous tests. Again, the interesting bit is if >256 parametes will slow down or not.

UseODBCParameterFormat	provider	RPS	CPR	Throughput [params/s]
	Odbc	16	16	557,563
true	Microsoft 3.0.0-dev	16	16	625,339
false	Microsoft 3.0.0-dev	16	16	218,018
	System	16	16	215,035

Wraith2 · 2021-03-18T21:14:54Z

I'm trying it but it isn't going well. The MetaType for varchar isn't PLP so i'm going to end up looking for size==-1 all over the place and even then it's going to touch a whole lot of places i'm really not happy about possibly breaking. I can only do the sync version because the async code in TDSExecuteRPCAddParameter means you can't add any data after you've written the parameter so i don't know how to get the PLP chunk terminator written.

Wraith2 · 2021-03-18T21:30:19Z

@KristianWedberg So am I reading the results right that the combination of omitting parameter names and using nvarchar gives better performance than ODBC?

KristianWedberg · 2021-03-18T21:42:19Z

MDS 3.0.0-dev + Omitting parameter names is consistently 5-10% faster than Odbc at 256 parameters.

nvarchar might or might not help a few percent, hard to say since I can't test them both at the same time.

Again - nvarchar might help greatly >256 parameters, that we don't know yet.

Wraith2 · 2021-03-18T21:43:54Z

You can test longer with this. Horrible hackjob, absolutely don't use it for anything other than this testing and only in sync mode or it'll crash like before.
Microsoft.Data.SqlClient.3.0.0-dev.zip

KristianWedberg · 2021-03-18T22:12:41Z

Success! 🎆

A quick test shows that the latest MDS (and with both switches enabled) scales well all the way to 2048 parameters (at 1,184,713 parameters/s), staying 5-10% faster than Odbc all the way! Full results will take longer.

Note: I did realize I had one issue in my testing setup until now: I was still giving MDS less work (since it used to be up to 20x slower). It might not have been the latest change that did the trick, it could e.g. be just the UseODBCParameterFormat switch, I'll go back and check this now.

Wraith2 · 2021-03-18T22:21:05Z

OK. Well consider that a proof of concept and now we know it's possible to make it work it'll need to be worked on to make it production capable. I need to learn how the driver is currently handling varchar(max) because it's clearly using some trickery if I've had to add workarounds to make it send a long string correctly. This could take some time.

KristianWedberg · 2021-03-19T00:06:58Z

To sum up the performance measurements:

There is no performance difference between the latest MDS nvarchar hackjob and the earlier MDS UseODBCParameterFormat without nvarchar (now that I'm running with the same data volume), i.e. the nvarchar hackjob changes are not needed to get great performance.
With the latest MDS nvarchar hackjob, enabling/disabling OmitNoMetadata makes no difference, from 1 to 2048 parameters.
Disabling UseODBCParameterFormat makes MDS slow down at >32 parameters.

Here are repeatable results using the earlier UseODBCParameterFormat MDS (i.e. without nvarchar changes):

UseODBCParameterFormat	provider	RPS	CPR	Parameters	Throughput [params/s]
	System	32	8	256	222,754
	System	32	16	512	157,327
	System	32	32	1024	90,484
	System	32	64	2048	44,815
false	Microsoft	4	8	32	131,051
false	Microsoft	8	8	64	211,954
false	Microsoft	16	8	128	227,015
false	Microsoft	32	8	256	230,582
false	Microsoft	32	16	512	158,538
false	Microsoft	32	32	1024	92,574
false	Microsoft	32	64	2048	49,096
true	Microsoft	4	8	32	141,485
true	Microsoft	8	8	64	252,613
true	Microsoft	16	8	128	422,281
true	Microsoft	32	8	256	600,829
true	Microsoft	32	16	512	801,047
true	Microsoft	32	32	1024	1,080,539
true	Microsoft	32	64	2048	1,239,986
	Odbc	4	8	32	133,329
	Odbc	8	8	64	239,711
	Odbc	16	8	128	378,298
	Odbc	32	8	256	549,672
	Odbc	32	16	512	755,040
	Odbc	32	32	1024	984,802
	Odbc	32	64	2048	1,120,003

So performance looks great.

Functionality wise, I'll highlight that my benchmark only uses each parameter once, in the same order they're written in the query. I'm guessing reusing a named parameter multiple times in the query requires additional testing.

[Edited]

Disabling UseODBCParameterFormat makes MDS slow down at >32 parameters (not >256)
Added test cases with <256 parameters

Wraith2 · 2021-03-19T04:48:35Z

• There is no performance difference between the latest MDS nvarchar hackjob and the earlier MDS UseODBCParameterFormat without nvarchar (now that I'm running with the same data volume), i.e. the nvarchar hackjob changes are not needed to get great performance.

Oh, that's surprising but welcome.
The parameter format is pretty easy it just needs some agreement on how to implement it so you can use it on a per-command basis. I listed 3 strategies earlier in the thread. That'll be down to @David-Engel and @cheenamalhotra

I'm guessing reusing a named parameter multiple times in the query requires additional testing.

yes, definitely. I expect it not to work.

Disabling UseODBCParameterFormat makes MDS slow down at >256 parameters.

That's got to be down to how the name mapping is implemented on the server. I doubt anyone expected it to reach the limit of parameters, that might be something @v-chojas could explore. As I think I said earlier the falloff curve looks list iteration instead of map lookup which makes sense because list tends to be faster than map when items<~100.

David-Engel · 2021-03-19T17:59:29Z

I'm not able to replicate the improved test results with the latest MDS package (hackjob) from @Wraith2 or the previous package. I've triple checked my MDS DLL and I'm pretty sure I'm using the correct one (modified dates line up to the correct times in my build output) and I've tried with and without the UseODBCParameterFormat switch. Performance still falls off around 256 params on all runs I try. 🤔

@KristianWedberg Can you push any changes you made to the test to your repo?

KristianWedberg · 2021-03-19T19:01:14Z

@David-Engel Yes, that will be due to running with too small a data volume for the fast MDS. I've refreshed the benchmark code to address that (for me each test case now takes about 1 second). You'll have to upgrade from MDS 2.1.2.

KristianWedberg changed the title ~~Poor INSERT performance with ExecuteNonQuery() and many parameters~~ Poor INSERT performance with ExecuteNonQuery() and many parameters Mar 9, 2021

cheenamalhotra added the 📈 Performance Issues that are targeted to performance improvements. label Mar 9, 2021

Wraith2 mentioned this issue Apr 18, 2021

Add SqlCommand.EnableOptimizedParameterBinding Feature #1041

Merged

to11mtm mentioned this issue Apr 30, 2021

Allow Parameter Usage on MultipleRowsCopy/MultipleRowsCopyAsync linq2db/linq2db#2975

Merged

cheenamalhotra closed this as completed in #1041 Jul 7, 2021

Poor INSERT performance with ExecuteNonQuery() and many parameters #974

Poor INSERT performance with ExecuteNonQuery() and many parameters #974

Comments

KristianWedberg commented Mar 9, 2021

Description

Reproduce

Expected behavior

Technical details

Wraith2 commented Mar 9, 2021

KristianWedberg commented Mar 9, 2021 • edited Loading

Wraith2 commented Mar 9, 2021

KristianWedberg commented Mar 9, 2021

cheenamalhotra commented Mar 9, 2021

roji commented Mar 9, 2021

KristianWedberg commented Mar 9, 2021

roji commented Mar 9, 2021

Wraith2 commented Mar 9, 2021

Wraith2 commented Mar 9, 2021

KristianWedberg commented Mar 10, 2021 • edited Loading

Wraith2 commented Mar 10, 2021 • edited Loading

KristianWedberg commented Mar 10, 2021

Wraith2 commented Mar 10, 2021

roji commented Mar 10, 2021

Wraith2 commented Mar 12, 2021

roji commented Mar 12, 2021

Wraith2 commented Mar 12, 2021

cheenamalhotra commented Mar 12, 2021

KristianWedberg commented Mar 12, 2021 • edited Loading

Wraith2 commented Mar 12, 2021

KristianWedberg commented Mar 13, 2021

Wraith2 commented Mar 13, 2021 • edited Loading

KristianWedberg commented Mar 13, 2021 • edited Loading

KristianWedberg commented Mar 13, 2021 • edited Loading

Wraith2 commented Mar 13, 2021

Wraith2 commented Mar 16, 2021

ErikEJ commented Mar 16, 2021

David-Engel commented Mar 18, 2021

Wraith2 commented Mar 18, 2021

cheenamalhotra commented Mar 18, 2021 • edited Loading

Wraith2 commented Mar 18, 2021

v-chojas commented Mar 18, 2021

Wraith2 commented Mar 18, 2021

KristianWedberg commented Mar 18, 2021

v-chojas commented Mar 18, 2021

Wraith2 commented Mar 18, 2021

v-chojas commented Mar 18, 2021

Wraith2 commented Mar 18, 2021

Wraith2 commented Mar 18, 2021 • edited Loading

David-Engel commented Mar 18, 2021 • edited Loading

Wraith2 commented Mar 18, 2021 • edited Loading

David-Engel commented Mar 18, 2021

KristianWedberg commented Mar 18, 2021

Wraith2 commented Mar 18, 2021

Wraith2 commented Mar 18, 2021

KristianWedberg commented Mar 18, 2021 • edited Loading

Wraith2 commented Mar 18, 2021

KristianWedberg commented Mar 18, 2021

Wraith2 commented Mar 18, 2021

KristianWedberg commented Mar 19, 2021 • edited Loading

Wraith2 commented Mar 19, 2021

David-Engel commented Mar 19, 2021

KristianWedberg commented Mar 19, 2021

KristianWedberg commented Mar 9, 2021 •

edited

Loading

KristianWedberg commented Mar 10, 2021 •

edited

Loading

Wraith2 commented Mar 10, 2021 •

edited

Loading

KristianWedberg commented Mar 12, 2021 •

edited

Loading

Wraith2 commented Mar 13, 2021 •

edited

Loading

KristianWedberg commented Mar 13, 2021 •

edited

Loading

KristianWedberg commented Mar 13, 2021 •

edited

Loading

cheenamalhotra commented Mar 18, 2021 •

edited

Loading

Wraith2 commented Mar 18, 2021 •

edited

Loading

David-Engel commented Mar 18, 2021 •

edited

Loading

Wraith2 commented Mar 18, 2021 •

edited

Loading

KristianWedberg commented Mar 18, 2021 •

edited

Loading

KristianWedberg commented Mar 19, 2021 •

edited

Loading