-
Notifications
You must be signed in to change notification settings - Fork 619
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance Issue in 3.45+ (due to generatedKeys) #1135
Comments
Performance is most likely back to what it was before the feature was removed. Anyhow there's not much way to go around this, SQLite need to get the generated ID straight after the records are inserted, while JDBC allows you to retrieve the generated keys at a deferred time. |
20-40 percent is quite huge so I think it's worth some consideration.
Wdyt? We'd be happy to contribute a solution |
can you maybe share details on how you measured that, and against which version ? Trying the previous version that had support for it would also be insightful. |
I don't get those points. |
The benchmark is just inserting rows into a table. What exact versions would you like to see the results on? |
JDBC has an interface where you can execute statements in batch (executeBatch). This is usually needed when there is a network between the driver and the database. This interface does not return the result from execute (rows affected for instance) and could be used as a way to circumvent the "read row id". Downside of this is that it's not widely used or known and the performance gains have less to do with "batch" but rather in a side effect of not calling rowid. On the PRO side, "to get better performance use batch" will also apply to sqlite.
I am wondering what are the calls to sqlite that cannot happen before calling the row id. Perhaps these are not widely used and therefor we can "delay" the call to "getRowId" to happen only if explicitly called or if one of the invalidating calls is made. For this i need to better understand the details of why the orignal "lazy" approach didn't work. |
At the minimum we would need:
|
Context: #329 |
Ideally all the benchmarks should be run using the same native library version: https://github.com/xerial/sqlite-jdbc/blob/master/USAGE.md#how-to-use-a-specific-native-library |
So... it's actually more than double (I made a simple benchmark that had fewer columns and fewer indices so the getrowid is more pronounced). The fact that the feature was removed is not that important since it is not used by the benchmark. Inserted 10000000 rows in 11028ms, SQLite 3.42.0 |
|
I also tested a version where I call getGeneratedKeys in the benchmark code by adding the following after update:
Inserted 10000000 rows in 17001ms, SQLite 3.42.0 (total:50000005000000) So
|
Thanks for the details. We would accept a PR to fix this, i suppose the easiest way would be to set this at We already have a fake pragma |
Would it be okay to revert in this case to the “lazy” getgenkeys? Granted it’s not thread safe but a lot of uses are single thread per connection + user could synchronize since they are aware they turned on the pragama.
|
It wouldn't be OK, the feature was not yielding correct results. The proposed Pragma would be to disable the feature entirely, not to enable the feature. If you are not up to a PR that's OK, i'm sure someone will be able to contribute. @sjlombardo FYI the reintroduction of the feature to get generated keys had perf impact. |
I'd argue that once the user has set on the pragma, |
@gotson I'll try to take a look at this in more detail as soon as I get some more free time. @yuvalp-k2view IMO, using the old "lazy" method by default is not a good idea. It is possible to trigger the synchronization issue without using multiple threads. Even single-threaded interleaving of updates could result in incorrect results. It would probably be better to use the corrected method but allow the feature to be disabled entirely if the application doesn't need it. |
@yuvalp-k2view do you want to check my PR #1182 and see if that would address your concerns ? |
Looks good! |
🎉 This issue has been resolved in |
…fig for proper cascading Refs: xerial#1135
In an INSERT only benchmark performance has dropped by 20-40%
To Reproduce
Insert a few thousand rows in a transaction.
What we are seeing in profiling is calls to generatedKeys, even though we are not accessing the generated keys.
We assume that this happens regardless if user invokes the call and believe this should be done "lazily" so as not to affect performance.
Expected behavior
No performance degradation
Environment (please complete the following information):
Mac M1
The text was updated successfully, but these errors were encountered: