Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQLite rowcount is corrupted when combining UPDATE RETURNING w/ id IN (?) #101117

Closed
rt121212121 opened this issue Jan 18, 2023 · 8 comments
Closed
Assignees
Labels
3.11 only security fixes 3.12 bugs and security fixes docs Documentation in the Doc dir topic-sqlite3 type-bug An unexpected behavior, bug, or error

Comments

@rt121212121
Copy link

rt121212121 commented Jan 18, 2023

Bug report

The rowcount from an sqlite UPDATE RETURNING query with WHERE id IN (?) used (but not == ?) will be 0.

The reproduction case from #93421 runs successfully. The modified reproduction case below errors with:

Traceback (most recent call last):
  File "C:\Data\R\git\electrumsv\x.py", line 41, in <module>
    go()
  File "C:\Data\R\git\electrumsv\x.py", line 35, in go
    assert rowcount == 1, (rowcount, f"was updated: {updated_value=='v2'}")
AssertionError: (0, 'was updated: True')

Reproduction case.

import os
import sqlite3



def go():
    # create table
    cursor = conn.cursor()
    cursor.execute(
        """CREATE TABLE some_table (
        id INTEGER NOT NULL,
        flags INTEGER NOT NULL,
        value VARCHAR(40) NOT NULL,
        PRIMARY KEY (id)
    )
    """
    )
    cursor.close()
    conn.commit()

    # run operation
    cursor = conn.cursor()
    cursor.execute(
        "INSERT INTO some_table (id, flags, value) VALUES (1, 4, 'v1')"
    )
    ident = 1

    cursor.execute(
        "UPDATE some_table SET value='v2' "
        "WHERE id IN (?) RETURNING id",
        (ident,),
    )
    rowcount = cursor.rowcount
    updated_value = cursor.execute("SELECT value FROM some_table WHERE id=1").fetchone()[0]
    assert rowcount == 1, (rowcount, f"was updated: {updated_value=='v2'}")

if os.path.exists("file.db"):
    os.unlink("file.db")

conn = sqlite3.connect("file.db")
go()
# never reaches here

Your environment

This is the 3.10.9 windows install from python.org on my computer. My coworker installed the version on his Windows computer and could reproduce it as well.

Python 3.10.9 (tags/v3.10.9:1dd9be6, Dec  6 2022, 20:01:21) [MSC v.1934 64 bit (AMD64)] on win32

My coworker has also had it happen on MacOS with 3.10.8

3.10.8 (main, Oct 13 2022, 10:17:43) [Clang 14.0.0 (clang-1400.0.29.102)]

3.2 GHz 8-Core Intel Xeon W

Linked PRs

@rt121212121 rt121212121 added the type-bug An unexpected behavior, bug, or error label Jan 18, 2023
@rt121212121
Copy link
Author

This is reproducible with the Windows builds for 3.10.9, 3.10.8, 3.10.7, 3.10.6.
This is not reproducible with the Windows build for 3.10.5.

@rt121212121
Copy link
Author

rt121212121 commented Jan 18, 2023

Coincidentally the only sqlite related fix in 3.10.6 in the changelog is for #93421.

@erlend-aasland
Copy link
Contributor

A UPDATE ... RETURNING query returns resulting rows. Example use:

query = "UPDATE some_table SET value='v2' WHERE id IN (?) RETURNING id"
params = (1,)
for row in cursor.execute(query, params):
    print("row", row, "updated")

The underlying SQLite library will only update the row count for every step that returns a row. In your example, you do not fetch all resulting rows; the UPDATE ... RETURNING statement is still running when you execute the SELECT query, effectively aborting the UPDATE ... RETURNING statement.

@erlend-aasland erlend-aasland closed this as not planned Won't fix, can't repro, duplicate, stale Jan 18, 2023
@rt121212121
Copy link
Author

Just as a last note for possible later reference should the circumstances around this change, I would note that one confounder I encountered was that on Azure Pipelines, we use 3.10.9 with the unfixed larger test case that is failing in 3.10.6 and above on Windows, and it passes every time. What does this mean? I do not know.

image

@erlend-aasland
Copy link
Contributor

Hm, yes that is a little bit weird. I'm reopening this until I've found time to investigate it again.

@erlend-aasland
Copy link
Contributor

Sorry 'bout the delay; I got lost in other issues.

The issue here, is that you're executing a query that returns row, but you do not let that query complete (you do not collect the resulting rows, as expected by an UPDATE...RETURNING query). The behaviour SQLite C API sqlite3_changes is unpredictable for statements that has not been completed (returned SQLITE_DONE). If you collect the result of the UPDATE..RETURNING query, you'll get the expected result. Alternatively, you can remove the RETURNING clause, since you're not using it.

AFAICS, this is misuse of the API: if you supply an UPDATE...RETURNING query, make sure you run it to completion before checking the row count. Perhaps a docs update could make this clearer.

@erlend-aasland erlend-aasland added docs Documentation in the Doc dir 3.11 only security fixes 3.12 bugs and security fixes labels May 8, 2023
erlend-aasland added a commit to erlend-aasland/cpython that referenced this issue May 8, 2023
The SQLite C API sqlite3_changes() can only be relied upon when the
statment has been run to completion.
@erlend-aasland
Copy link
Contributor

Some more insight

Also, in earlier Python versions, the cursor object would pre-fetch a single row during execute; since 3.10 (?), rows are fetched lazily.

The sqlite3_changes API is difficult to get right. We could update the rowcount when the statement returns SQLITE_ROW, but unfortunately that would make us rely on implementation details in SQLite which can change between versions. SQLite only guarantees the behaviour of sqlite3_changes when a statement has been run to completion. The old behaviour (pre gh-93421) was buggy, as sqlite3_changes could be called before a statement was run to completion; that may work in some cases, but it may also not work. The safest thing we can do, is to abide with the contract the SQLite C API provides:

These functions return the number of rows modified, inserted or deleted by the most recently completed INSERT, UPDATE or DELETE statement on the database connection specified by the only parameter.

Note the wording (emphasis by me): the most recently completed [...] statement

What now?

AFAIK, we now follow the rules of this particular SQLite C API, and I would prefer it to stay that way. It is unfortunate that 3.10 changed behaviour in a patch release. That is entirely my fault, and it shows we cannot always rely on our test suites. It also displays the dangers of slightly altering the semantics of existing API.

Unfortunately, 3.10 is only accepting security fixes, so 3.10 stays how it is.

For 3.11 and 3.12, I prefer to improve the docs only. Reverting gh-93421 is a backwards compatible change, and so is reverting the even older change of lazily fetching resulting rows. IMO, the current behaviour encourages good SQL and SQLite programming practise: if you run a query that returns resulting rows, make sure you collect them.

erlend-aasland added a commit that referenced this issue May 11, 2023
The SQLite C API sqlite3_changes() can only be relied upon when the
current active statement has been run to completion.
erlend-aasland added a commit to erlend-aasland/cpython that referenced this issue May 11, 2023
…ocs (python#104287)

The SQLite C API sqlite3_changes() can only be relied upon when the
current active statement has been run to completion.
erlend-aasland added a commit that referenced this issue May 11, 2023
…104287) (#104381)

The SQLite C API sqlite3_changes() can only be relied upon when the
current active statement has been run to completion.
@erlend-aasland
Copy link
Contributor

The docs have been updated, hopefully to the better:

As explained here on the issue, and on the PR, I prefer to resolve this with docs updates only. The current implementation heeds the contract given by the SQLite C API. I do not think changing the semantics and behaviour of the rowcount attribute further is a wise move; it may solve one specific issue, but it may be a breaking change to others.

If the current docs are still unclear; please let us know what is wrong and how to improve them.
Thanks for the report!

@github-project-automation github-project-automation bot moved this from In Progress to Done in sqlite3 issues May 11, 2023
carljm added a commit to carljm/cpython that referenced this issue May 11, 2023
* main: (27 commits)
  pythongh-87849: fix SEND specialization family definition (pythonGH-104268)
  pythongh-101819: Adapt _io.IOBase.seek and _io.IOBase.truncate to Argument Clinic (python#104384)
  pythongh-101819: Adapt _io._Buffered* methods to Argument Clinic (python#104367)
  pythongh-101819: Refactor `_io` futher in preparation for module isolation (python#104369)
  pythongh-101819: Adapt _io.TextIOBase methods to Argument Clinic (python#104383)
  pythongh-101117: Improve accuracy of sqlite3.Cursor.rowcount docs (python#104287)
  pythonGH-92184: Convert os.altsep to '/' in filenames when creating ZipInfo objects (python#92185)
  pythongh-104357: fix inlined comprehensions that close over iteration var (python#104368)
  pythonGH-90208: Suppress OSError exceptions from `pathlib.Path.glob()` (pythonGH-104141)
  pythonGH-102181: Improve specialization stats for SEND (pythonGH-102182)
  pythongh-103000: Optimise `dataclasses.asdict` for the common case (python#104364)
  pythongh-103538: Remove unused TK_AQUA code (pythonGH-103539)
  pythonGH-87695: Fix OSError from `pathlib.Path.glob()` (pythonGH-104292)
  pythongh-104263: Rely on Py_NAN and introduce Py_INFINITY (pythonGH-104202)
  pythongh-104010: Separate and improve docs for `typing.get_origin` and `typing.get_args` (python#104013)
  pythongh-101819: Adapt _io._BufferedIOBase_Type methods to Argument Clinic (python#104355)
  pythongh-103960: Dark mode: invert image brightness (python#103983)
  pythongh-104252: Immortalize Py_EMPTY_KEYS (pythongh-104253)
  pythongh-101819: Clean up _io windows console io after pythongh-104197 (python#104354)
  pythongh-101819: Harden _io init (python#104352)
  ...
carljm added a commit to carljm/cpython that referenced this issue May 11, 2023
* main:
  pythongh-87849: fix SEND specialization family definition (pythonGH-104268)
  pythongh-101819: Adapt _io.IOBase.seek and _io.IOBase.truncate to Argument Clinic (python#104384)
  pythongh-101819: Adapt _io._Buffered* methods to Argument Clinic (python#104367)
  pythongh-101819: Refactor `_io` futher in preparation for module isolation (python#104369)
  pythongh-101819: Adapt _io.TextIOBase methods to Argument Clinic (python#104383)
  pythongh-101117: Improve accuracy of sqlite3.Cursor.rowcount docs (python#104287)
  pythonGH-92184: Convert os.altsep to '/' in filenames when creating ZipInfo objects (python#92185)
  pythongh-104357: fix inlined comprehensions that close over iteration var (python#104368)
  pythonGH-90208: Suppress OSError exceptions from `pathlib.Path.glob()` (pythonGH-104141)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.11 only security fixes 3.12 bugs and security fixes docs Documentation in the Doc dir topic-sqlite3 type-bug An unexpected behavior, bug, or error
Projects
Status: Done
Development

No branches or pull requests

2 participants