Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sqlite: fix inconsistent read-after-write #3354

Merged
merged 5 commits into from
Aug 1, 2024

Conversation

ckampfe
Copy link
Contributor

@ckampfe ckampfe commented Jul 18, 2024

Fixes #3120

As far as I can tell, this fixes #2099 (and I think #3120, which appears to be the same)

My minimal reproduction looks like this (which is adapted from the example in #2099):

use sqlx::sqlite::SqlitePoolOptions;

#[tokio::main]
async fn main() {
    let sqlite_options = sqlx::sqlite::SqliteConnectOptions::new()
        .filename("test.db")
        .busy_timeout(std::time::Duration::from_secs(5))
        .journal_mode(sqlx::sqlite::SqliteJournalMode::Wal)
        .create_if_missing(true);

    let db = SqlitePoolOptions::new()
        .acquire_timeout(std::time::Duration::from_secs(5))
        .max_connections(5)
        .min_connections(5)
        .connect_with(sqlite_options)
        .await
        .unwrap();

    sqlx::query("create table if not exists some_table (id integer primary key, name text)")
        .execute(&db)
        .await
        .unwrap();

    let name = "name";

    let (id,): (i64,) = sqlx::query_as("INSERT INTO some_table (name) VALUES ($1) RETURNING id")
        .bind(name)
        .fetch_one(&db)
        .await
        .unwrap();

    let out: (i64, String) = sqlx::query_as("SELECT id, name FROM some_table WHERE id = $1")
        .bind(id)
        .fetch_one(&db)
        .await
        .unwrap(); // => sqlx::Error no row found

    dbg!(out);
}

I can pretty easily get this code to panic on the main branch, just by running it a few times. It doesn't panic every time, as other issue reporters have noted, but it's consistent enough to notice.

With this fix, I can't get it to panic at all.

The fix in this PR works by forcing the stream here to run until completion, rather than returning early in the case that it returns rows (i.e., via a RETURNING in SQL).

As far as I can tell, this is related to how step in sqlx-sqlite/src/statement/handle.rs works:

    pub(crate) fn step(&mut self) -> Result<bool, SqliteError> {
        // SAFETY: we have exclusive access to the handle
        unsafe {
            loop {
                match sqlite3_step(self.0.as_ptr()) {
                    SQLITE_ROW => return Ok(true),
                    SQLITE_DONE => return Ok(false),
                    SQLITE_MISUSE => panic!("misuse!"),
                    SQLITE_LOCKED_SHAREDCACHE => {
                        // The shared cache is locked by another connection. Wait for unlock
                        // notification and try again.
                        unlock_notify::wait(self.db_handle())?;
                        // Need to reset the handle after the unlock
                        // (https://www.sqlite.org/unlock_notify.html)
                        sqlite3_reset(self.0.as_ptr());
                    }
                    _ => return Err(SqliteError::new(self.db_handle())),
                }
            }
        }
    }

where the SQLITE_ROW => return Ok(true), clause is causing the ExecuteIter to terminate early, before SQLite can return SQLITE_DONE, which I believe in turn means SQLite itself may not be committing the transaction: https://github.com/launchbadge/sqlx/blob/main/sqlx-sqlite/src/connection/execute.rs#L96

This SQLITE_ROW then causes this whole stream to terminate early, as it simply returns when it matches an Either::Right(row) rather than an Either::Left(SqliteQueryResult): https://github.com/launchbadge/sqlx/blob/main/sqlx-sqlite/src/connection/executor.rs#L68-L70

I think this is what is happening, and I think this PR fixes it, but please let me know if my understanding is incorrect.

Thanks!

while let Some(res) = stream.try_next().await? {
if let Either::Right(row) = res {
return Ok(Some(row));
out = Ok(Some(row));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason it doesn't work this way is because we didn't want the application to have to wait for the full result set before returning. For example, if someone forgot a LIMIT 1 on their query and was doing a SELECT ... ORDER BY ... to get the first/last item in an ordered set, the call would have to churn through the whole dataset before returning.

The bug happens because of a race condition: the worker sends the row data and then continues calling sqlite3_step() and only stops when it notices the result channel is closed, then it calls sqlite3_reset() which is supposed to actually commit the changes. In the meantime, the application receives the row and then continues to the next query, which it executes on another connection which may not be able to see the changes yet.

In reality, this needs to be handled proactively by the driver. The message to the worker should specify that it's only expecting one row, and then the worker should step until it gets the row and immediately call sqlite3_reset() (by dropping ExecuteIter) before returning it. That should fix the race condition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, thanks @abonander. Does the approach in 02ca0ed look a little bit closer to what you had in mind? I don't know the project's naming/organization conventions so apologies for that.

@ckampfe
Copy link
Contributor Author

ckampfe commented Jul 25, 2024

@abonander I added a commit which changes the approach slightly to account for a test failure I saw in CI, where the test it_can_execute_multiple_statements was failing. I was was able to isolate that test locally and I believe the approach in 93d53ba fixes the issue with running multiple statements in the same fetch_one instance.

pub(crate) enum Returning {
Many,
One,
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need a bespoke enum. I would just take a limit: Option<usize> parameter and stop reading rows when the limit is reached. For .fetch_optional(), you'd pass Some(1).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abonander good call, updated.

@CommanderStorm

This comment has been minimized.

@ckampfe
Copy link
Contributor Author

ckampfe commented Jul 28, 2024

@ckampfe Could you add Fixes #3120 (or another closing keyword) to the description?

@CommanderStorm Added.

@abonander
Copy link
Collaborator

@ckampfe if you rebase it should fix CI.

@ckampfe ckampfe force-pushed the fix-sqlite-read-after-write branch from 5c0f958 to b95a256 Compare July 30, 2024 00:44
@ckampfe
Copy link
Contributor Author

ckampfe commented Jul 30, 2024

@abonander done!

@abonander abonander merged commit ff0252d into launchbadge:main Aug 1, 2024
70 checks passed
@ckampfe ckampfe deleted the fix-sqlite-read-after-write branch August 1, 2024 20:36
@ckampfe
Copy link
Contributor Author

ckampfe commented Aug 1, 2024

Hey @abonander and @CommanderStorm, thank you for working with me on this PR, I appreciate it and all the work you do on this project.

jrasanen pushed a commit to jrasanen/sqlx that referenced this pull request Oct 14, 2024
* sqlite: fix inconsistent read-after-write

fetch_one/fetch_optional

* try pushing fetch_optional early-return into worker

* run cargo fmt

* fix "it_can_execute_multiple_statements" test failure

* use Option<usize> instead of bespoke enum for rows returned
jrasanen pushed a commit to jrasanen/sqlx that referenced this pull request Oct 14, 2024
* sqlite: fix inconsistent read-after-write

fetch_one/fetch_optional

* try pushing fetch_optional early-return into worker

* run cargo fmt

* fix "it_can_execute_multiple_statements" test failure

* use Option<usize> instead of bespoke enum for rows returned
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Non-ACID behaviour in Sqlite driver. sqlite driver returns result before the actual commit
3 participants