Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance Issue in 3.45+ (due to generatedKeys) #1135

Closed
yuvalp-k2view opened this issue Jun 25, 2024 · 20 comments · Fixed by #1182
Closed

Performance Issue in 3.45+ (due to generatedKeys) #1135

yuvalp-k2view opened this issue Jun 25, 2024 · 20 comments · Fixed by #1182
Labels
enhancement help wanted Contributions are welcome released Issue has been released

Comments

@yuvalp-k2view
Copy link
Contributor

In an INSERT only benchmark performance has dropped by 20-40%

To Reproduce
Insert a few thousand rows in a transaction.
What we are seeing in profiling is calls to generatedKeys, even though we are not accessing the generated keys.
We assume that this happens regardless if user invokes the call and believe this should be done "lazily" so as not to affect performance.

Expected behavior
No performance degradation

Environment (please complete the following information):
Mac M1

@gotson
Copy link
Collaborator

gotson commented Jun 26, 2024

Performance is most likely back to what it was before the feature was removed.

Anyhow there's not much way to go around this, SQLite need to get the generated ID straight after the records are inserted, while JDBC allows you to retrieve the generated keys at a deferred time.

@gotson gotson added wontfix and removed triage labels Jun 26, 2024
@yuvalp-k2view
Copy link
Contributor Author

20-40 percent is quite huge so I think it's worth some consideration.
A few suggestions:

  1. Simplest - perhaps a way to turn it off using a private SqliteConnection API for anyone not intending to read ids.
  2. A bit vague but makes sense - Use the executeBatch API as a way to signal that we are in high performance and there is no need to read the row id.
  3. Complex - Differed time... what actions is permitted in jdbc between INSERT and read the generated key that are not allowed by sqlite? Perhaps only those trigger the read rowid but doing another INSERT does not. This way if all I'm doing is calling INSERTs, I do not pay the ~30% price.

Wdyt? We'd be happy to contribute a solution

@gotson
Copy link
Collaborator

gotson commented Jun 26, 2024

20-40 percent is quite huge so I think it's worth some consideration.

can you maybe share details on how you measured that, and against which version ? Trying the previous version that had support for it would also be insightful.

@gotson
Copy link
Collaborator

gotson commented Jun 26, 2024

  • A bit vague but makes sense - Use the executeBatch API as a way to signal that we are in high performance and there is no need to read the row id.

  • Complex - Differed time... what actions is permitted in jdbc between INSERT and read the generated key that are not allowed by sqlite? Perhaps only those trigger the read rowid but doing another INSERT does not. This way if all I'm doing is calling INSERTs, I do not pay the ~30% price.

I don't get those points.

@yuvalp-k2view
Copy link
Contributor Author

20-40 percent is quite huge so I think it's worth some consideration.

can you maybe share details on how you measured that, and against which version ? Trying the previous version that had support for it would also be insightful.

The benchmark is just inserting rows into a table. What exact versions would you like to see the results on?

@yuvalp-k2view
Copy link
Contributor Author

  • A bit vague but makes sense - Use the executeBatch API as a way to signal that we are in high performance and there is no need to read the row id.

JDBC has an interface where you can execute statements in batch (executeBatch). This is usually needed when there is a network between the driver and the database. This interface does not return the result from execute (rows affected for instance) and could be used as a way to circumvent the "read row id". Downside of this is that it's not widely used or known and the performance gains have less to do with "batch" but rather in a side effect of not calling rowid. On the PRO side, "to get better performance use batch" will also apply to sqlite.

  • Complex - Differed time... what actions is permitted in jdbc between INSERT and read the generated key that are not allowed by sqlite? Perhaps only those trigger the read rowid but doing another INSERT does not. This way if all I'm doing is calling INSERTs, I do not pay the ~30% price.

I am wondering what are the calls to sqlite that cannot happen before calling the row id. Perhaps these are not widely used and therefor we can "delay" the call to "getRowId" to happen only if explicitly called or if one of the invalidating calls is made. For this i need to better understand the details of why the orignal "lazy" approach didn't work.

@gotson
Copy link
Collaborator

gotson commented Jun 26, 2024

20-40 percent is quite huge so I think it's worth some consideration.

can you maybe share details on how you measured that, and against which version ? Trying the previous version that had support for it would also be insightful.

The benchmark is just inserting rows into a table. What exact versions would you like to see the results on?

At the minimum we would need:

  • the benchmark code
  • the measurements
  • tested under versions:
    • 3.42.x which had the feature
    • 3.43.x which removed the feature
    • 3.45.x which reintroduced the feature

@gotson
Copy link
Collaborator

gotson commented Jun 26, 2024

For this i need to better understand the details of why the orignal "lazy" approach didn't work.

Context: #329

@gotson
Copy link
Collaborator

gotson commented Jun 26, 2024

Ideally all the benchmarks should be run using the same native library version: https://github.com/xerial/sqlite-jdbc/blob/master/USAGE.md#how-to-use-a-specific-native-library

@gotson gotson added waiting for feedback Waiting for a feedback from the issue creator and removed wontfix labels Jul 2, 2024
@yuvalp-k2view
Copy link
Contributor Author

yuvalp-k2view commented Jul 4, 2024

So... it's actually more than double (I made a simple benchmark that had fewer columns and fewer indices so the getrowid is more pronounced). The fact that the feature was removed is not that important since it is not used by the benchmark.

Inserted 10000000 rows in 11028ms, SQLite 3.42.0
Inserted 10000000 rows in 11170ms, SQLite 3.43.0
Inserted 10000000 rows in 12080ms, SQLite 3.44.0
Inserted 10000000 rows in 25412ms, SQLite 3.45.0
Inserted 10000000 rows in 26267ms, SQLite 3.45.3
Inserted 10000000 rows in 26330ms, SQLite 3.46.0

@yuvalp-k2view
Copy link
Contributor Author


import java.io.File;
import java.sql.*;

public class SimpleSqliteBenchmark {

    public static void main(String[] args) throws SQLException {

        String file = "benchmark.db";
        int transactions = 1000;
        int rowsInTx = 10_000;

        try (Connection connection = DriverManager.getConnection("jdbc:sqlite:"+file)) {
            try (Statement statement = connection.createStatement()) {
                statement.execute("DROP TABLE IF EXISTS BENCHMARK");
                statement.execute("CREATE TABLE BENCHMARK (COL1 text, COL2 text, COL3 text)");
            }
            long start = System.currentTimeMillis();
            try (PreparedStatement ps = connection.prepareStatement("INSERT INTO BENCHMARK VALUES (?,?,?)")) {
                connection.setAutoCommit(false);
                for (int tx = 0; tx < transactions; ++tx) {
                    for (int r = 0; r < rowsInTx; ++r) {
                        for (int col = 0; col < 3; ++col) {
                            ps.setObject(col + 1, "Some data for my columns");
                        }
                        ps.executeUpdate();
                    }
                    connection.commit();
                }
                System.out.printf("Inserted %d rows in %dms, %s %s",
                        rowsInTx * transactions, System.currentTimeMillis() - start,
                        connection.getMetaData().getDatabaseProductName(),
                        connection.getMetaData().getDatabaseProductVersion()
                );
            }
            new File(file).delete();
        }
    }
}

@yuvalp-k2view
Copy link
Contributor Author

yuvalp-k2view commented Jul 4, 2024

I also tested a version where I call getGeneratedKeys in the benchmark code by adding the following after update:

                        try (ResultSet rs = ps.getGeneratedKeys()) {
                            rs.next();
                            total += rs.getInt(1);
                        }

Inserted 10000000 rows in 17001ms, SQLite 3.42.0 (total:50000005000000)
3.43.0 not implemented by SQLite JDBC driver
3.44.0 not implemented by SQLite JDBC driver
Inserted 10000000 rows in 26546ms, SQLite 3.45.0 (total:50000005000000)
Inserted 10000000 rows in 27270ms, SQLite 3.45.3 (total:50000005000000)
Inserted 10000000 rows in 27444ms, SQLite 3.46.0 (total:50000005000000)

So

  1. the previous "lazy" mechanism was faster even when in use.
  2. as expected, actually calling getGeneratedKeys in 3.45+ has little effect since the key was already fetched

@gotson
Copy link
Collaborator

gotson commented Jul 5, 2024

Thanks for the details.

We would accept a PR to fix this, i suppose the easiest way would be to set this at SQLiteConnectionConfig. Doing it with a new Pragma is probably the best way, as that's how the configuration is usually done.

We already have a fake pragma JDBC_EXPLICIT_READONLY, we could add one more like JDBC_NO_GENERATED_KEYS.

@yuvalp-k2view
Copy link
Contributor Author

yuvalp-k2view commented Jul 5, 2024 via email

@gotson
Copy link
Collaborator

gotson commented Jul 6, 2024

It wouldn't be OK, the feature was not yielding correct results. The proposed Pragma would be to disable the feature entirely, not to enable the feature.

If you are not up to a PR that's OK, i'm sure someone will be able to contribute.

@sjlombardo FYI the reintroduction of the feature to get generated keys had perf impact.

@gotson gotson added help wanted Contributions are welcome and removed waiting for feedback Waiting for a feedback from the issue creator labels Jul 6, 2024
@yuvalp-k2view
Copy link
Contributor Author

yuvalp-k2view commented Jul 7, 2024

I'd argue that once the user has set on the pragma,
it is okay to require them to take care of thread safety, better than to fully disable the feature all together. As stated now there is no middle ground between performance and functionality.
Also, are you happy with the default being low performance?
Perhaps the default should be the old "lazy" method and the pragma should be: synchronized_getrowid?
Last but not least, using multiple threads against the same connection is a sqlite anti-pattern... maybe if you do that, its okay to require a special pragma (i.e. reverse the default for the sake of performance).

@sjlombardo
Copy link
Contributor

@gotson I'll try to take a look at this in more detail as soon as I get some more free time.

@yuvalp-k2view IMO, using the old "lazy" method by default is not a good idea. It is possible to trigger the synchronization issue without using multiple threads. Even single-threaded interleaving of updates could result in incorrect results. It would probably be better to use the corrected method but allow the feature to be disabled entirely if the application doesn't need it.

@gotson
Copy link
Collaborator

gotson commented Sep 25, 2024

@yuvalp-k2view do you want to check my PR #1182 and see if that would address your concerns ?

@yuvalp-k2view
Copy link
Contributor Author

@yuvalp-k2view do you want to check my PR #1182 and see if that would address your concerns ?

Looks good!

@gotson gotson closed this as completed in aed5121 Sep 25, 2024
@github-actions github-actions bot added the released Issue has been released label Sep 25, 2024
Copy link
Contributor

🎉 This issue has been resolved in 3.46.1.2 (Release Notes)

gotson added a commit to gotson/sqlite-jdbc that referenced this issue Sep 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement help wanted Contributions are welcome released Issue has been released
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants