Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QueryBuilder: Use a nested session in iterall and iterdict #5736

Merged
merged 2 commits into from
Nov 10, 2022

Conversation

sphuber
Copy link
Contributor

@sphuber sphuber commented Nov 1, 2022

Fixes #5672

This is a minimal alternative to #5731 . It adds a test that reproduces the bug reported in #5672 and then adds the fix.

This will prevent the ModelWrapper from calling commit on the session
when any of the yielded results is mutated as this will cause the cursor
of the connection to be reset and it to become invalid. In the next
iteration of the yield an exception will be raised.

Note that a session transaction should only be opened if we are not
already inside one, in which case a nullcontext is used.

@sphuber sphuber requested a review from chrisjsewell November 1, 2022 14:45
@chrisjsewell
Copy link
Member

The iteration over the nodes to be incomplete and not yield all rows of the query

@sphuber this would be the case, if the number of rows yielded was more than the batch_size, i.e. the exception raises either:

  1. when a subsequent batch of rows is fetched (after the first batch)
  2. when the iteration is finalised

so I'm afraid this is not a solution.

@sphuber
Copy link
Contributor Author

sphuber commented Nov 9, 2022

Ah yeah of course. I guess I should update the test to the following:

@pytest.mark.usefixtures('aiida_profile_clean')
def test_iterall_with_mutation(self):
    """Test that nodes can be mutated while being iterated using ``QueryBuilder.iterall``."""
    count = 10
    pks = []

    for _ in range(count):
        node = orm.Data().store()
        pks.append(node.pk)

    # Ensure that batch size is smaller than the total rows yielded
    for [node] in orm.QueryBuilder().append(orm.Data).iterall(batch_size=2):
        node.base.extras.set('key', 'value')

    for pk in pks:
        assert orm.load_node(pk).get_extra('key') == 'value'

This now correctly fails, as only the first two nodes of the iteration are updated and the other batches are not processed.

@chrisjsewell
Copy link
Member

This passes that specific test, but not sure if it has other implications:

    def iterall(self, data: QueryDictType, batch_size: Optional[int]) -> Iterable[List[Any]]:
        """Return an iterator over all the results of a list of lists."""
        with self.query_session(data) as build:

            stmt = build.query.statement.execution_options(yield_per=batch_size)
            session = self.get_session()
            with session.begin_nested():
                for resultrow in session.execute(stmt):
                    yield [self.to_backend(rowitem) for rowitem in resultrow]

@sphuber sphuber force-pushed the fix/5672/querybuilder-iterate-commit branch 2 times, most recently from 517790f to ef02bbc Compare November 9, 2022 15:50
@codecov
Copy link

codecov bot commented Nov 9, 2022

Codecov Report

Base: 79.86% // Head: 80.25% // Increases project coverage by +0.40% 🎉

Coverage data is based on head (fac632d) compared to base (530a724).
Patch coverage: 82.61% of modified lines in pull request are covered.

❗ Current head fac632d differs from pull request most recent head 0c5890c. Consider uploading reports for the commit 0c5890c to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5736      +/-   ##
==========================================
+ Coverage   79.86%   80.25%   +0.40%     
==========================================
  Files         496      497       +1     
  Lines       37021    38058    +1037     
==========================================
+ Hits        29563    30540     +977     
- Misses       7458     7518      +60     
Impacted Files Coverage Δ
aiida/storage/psql_dos/orm/querybuilder/main.py 84.37% <82.61%> (-0.32%) ⬇️
aiida/manage/tests/pytest_fixtures.py 85.40% <0.00%> (-0.85%) ⬇️
aiida/transports/plugins/local.py 82.59% <0.00%> (-0.24%) ⬇️
aiida/cmdline/commands/cmd_process.py 80.12% <0.00%> (-0.22%) ⬇️
aiida/engine/processes/functions.py 93.38% <0.00%> (-0.15%) ⬇️
aiida/manage/external/rmq.py 76.29% <0.00%> (ø)
aiida/engine/utils.py 91.18% <0.00%> (+0.07%) ⬆️
aiida/cmdline/utils/decorators.py 91.87% <0.00%> (+0.20%) ⬆️
aiida/cmdline/commands/cmd_code.py 93.98% <0.00%> (+0.42%) ⬆️
aiida/engine/processes/ports.py 98.39% <0.00%> (+0.43%) ⬆️
... and 9 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

Currently, when iterating over the rows of a query using `iterall` or
`iterdict`, will raise `sqlalchemy.ProgrammingError` if any of the nodes
returned by the iterator are mutated, for example by setting an extra.

The reason is that adding an extra is mutating the node, and the model
is wrapped in the `ModelWrapper` class which will automatically call
save if a field of a stored node is changed. This is calling `commit` on
the session which is invalidating the cursor. The documentation of
sqlalchemy explicitly states that one should not commit on a cursor that
is being used to iterate over a query with `yield_per`.
@sphuber sphuber force-pushed the fix/5672/querybuilder-iterate-commit branch from 61aa335 to 1bfddf9 Compare November 10, 2022 11:04
This will prevent the `ModelWrapper` from calling commit on the session
when any of the yielded results is mutated as this will cause the cursor
of the connection to be reset and it to become invalid. In the next
iteration of the yield an exception will be raised.

Note that a session transaction should only be opened if we are not
already inside one, in which case a `nullcontext` is used.
@sphuber sphuber force-pushed the fix/5672/querybuilder-iterate-commit branch from 1bfddf9 to 0c5890c Compare November 10, 2022 11:22
@sphuber
Copy link
Contributor Author

sphuber commented Nov 10, 2022

@chrisjsewell found the problem. Tests pass now, ready for final review

@sphuber sphuber changed the title QueryBuilder: Catch exception in iterall and iterdict QueryBuilder: Use a nested session in iterall and iterdict Nov 10, 2022
@sphuber sphuber merged commit 98fd067 into aiidateam:main Nov 10, 2022
@sphuber sphuber deleted the fix/5672/querybuilder-iterate-commit branch November 10, 2022 12:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

extra can not be set on the fly when query the database
2 participants