♻️ REFACTOR: Remove `QueryManager` #5101

chrisjsewell · 2021-08-26T13:00:44Z

The AbstractQueryManager class and its implementations (DjabgoQueryManager, SqlaQueryManager), is an odd mix of a few methods, with no real conceptual purpose; essentially a catch-all for things that didn't have another home.

In this PR, I have removed the classes, and reassigned their methods as appropriate:

get_creation_statistics has been moved to the BackendQueryBuilder (in fact, since #5093, I would say this is maybe more appropriately termed BackendQueryManager)

get_duplicate_uuids and apply_new_uuid_mapping have been moved to aiida/backends/general/migrations/utils.py (they directly use SQL, so cannot be easily abstracted on BackendQueryBuilder).

In fact, there is actually a further duplication here with aiida/manage/database/integrity/duplicate_uuid.py; @sphuber in ee7a65e you updated aiida/backends/general/migrations/utils.p::deduplicate_uuids, but actually this is only used in tests/backends/aiida_django/migrations/test_migrations_many.py, whereas aiida/manage/database/integrity/duplicate_uuid.py::deduplicate_uuids (which has not been updated) is the one used in verdi database detect-duplicate-uuid.

I would suggest aiida/manage/database/integrity/duplicate_uuid.py should also be removed? (done in 845b112, and extracted to aiida/backends/general/migrations/duplicate_uuids.py to avoid verdi imports of aiida.repository)

Finally, get_bands_and_parents_structure has been moved to aiida/orm/nodes/data/array/bands.py.
This also had a "backend optimised" implementation (on DjabgoQueryManager, but not on SqlaQueryManager), so the question would be:

Is verdi data band list used particularly ubiquitously and is the backend optimised implementation that much faster, to warrant this backend specific code? If so, is there some extra method(s) we can add to the BackendQueryBuilder, to allow this to not have to be "special cased"?

cc @sphuber, @ltalirz for comment

codecov · 2021-08-26T13:18:53Z

Codecov Report

Merging #5101 (e58d7e2) into develop (aea9bfe) will increase coverage by 0.18%.
The diff coverage is 87.72%.

@@             Coverage Diff             @@
##           develop    #5101      +/-   ##
===========================================
+ Coverage    80.64%   80.81%   +0.18%     
===========================================
  Files          537      534       -3     
  Lines        37168    36967     -201     
===========================================
- Hits         29969    29873      -96     
+ Misses        7199     7094     -105

Flag	Coverage Δ
django	`75.52% <87.72%> (+0.22%)`	⬆️
sqlalchemy	`74.75% <71.06%> (+0.43%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
aiida/backends/general/migrations/utils.py	`87.50% <ø> (-1.68%)`	⬇️
...sions/37f3d4882837_make_all_uuid_columns_unique.py	`91.31% <ø> (ø)`
aiida/manage/__init__.py	`100.00% <ø> (ø)`
aiida/manage/database/__init__.py	`100.00% <ø> (ø)`
aiida/manage/database/integrity/__init__.py	`100.00% <ø> (ø)`
aiida/orm/implementation/backends.py	`94.29% <ø> (+1.98%)`	⬆️
aiida/orm/implementation/django/backend.py	`100.00% <ø> (ø)`
aiida/orm/implementation/sqlalchemy/backend.py	`97.44% <ø> (-0.12%)`	⬇️
aiida/cmdline/commands/cmd_database.py	`62.41% <50.00%> (ø)`
...orm/implementation/sqlalchemy/querybuilder/main.py	`81.90% <81.25%> (-0.01%)`	⬇️
... and 15 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update aea9bfe...e58d7e2. Read the comment docs.

sphuber · 2021-08-31T08:34:27Z

I am personally not a fan of having these very domain specific and backend dependent queries in aiida-core. This is mostly a relic from the olden days when this repo was all there was, and even all QE code was hosted here. As to whether it is being used, I am not sure. But I do know that @dev-zero recently opened this PR to fix a bug with this very functionality. So that would suggest that at least one person is using it. Still I would be interested to see the performance increase compared to the backend-agnostic version and we really need a dedicated query for this. If not, I would be in favor for removing this.

sphuber

I am fine with these changes in principle (except for two comments). When those are addressed I am happy to approve

sphuber · 2021-08-31T08:47:08Z

aiida/backends/general/migrations/duplicate_uuids.py



-def deduplicate_uuids(table=None, dry_run=True):
+def deduplicate_uuids(table=None, dry_run=False):


I wouldn't change this default. Is there a particular reason you did so?

Ok done, it doesn't make too much difference because this function is only used two places in the code aiida/cmdline/commands/cmd_database.py (where dry_run is set explicitly) and tests/backends/aiida_django/migrations/test_migrations_many.py (where it was assuming dry_run=False, but now I've set it explicitly 351c5d0)

sphuber · 2021-08-31T08:50:03Z

aiida/orm/implementation/sqlalchemy/querybuilder/main.py

@@ -1072,3 +1072,34 @@ def analyze_query(self, data: QueryDictType, execute: bool = True, verbose: bool
        options = f' ({options})' if options else ''
        rows = self.get_session().execute(f'EXPLAIN{options} {compiled.string}').fetchall()
        return '\n'.join(row.values()[0] for row in rows)
+
+    def get_creation_statistics(self, user_pk: Optional[int] = None) -> Dict[str, Any]:


Wondering if this is the best place to put this? Since this is a very specific query, I don't think this should be on the QueryBuilder itself. If I am not mistaken, this is exclusively used by the REST API. So why not just implement it there as a utility function for the endpoint that calls it?

Because it requires a backend specific implementation, and there should be no backend specific query code that does not go through this abstraction.
Essentially, IMO there should be no sqlalchemy imports outside of aiida/backends/sqlalchemy, aiida/orm/implementation/sqlalchemy (and even then, I find this split a bit weird and want to look if there can be a consolidation)

Because it requires a backend specific implementation

Does it though? I see that it does now, but don't really see why it should be to begin with. This can be implemented just using the QueryBuilder couldn't it? And it shouldn't necessarily be that slow if you project the keys you need directly.

Err, I feel it is a lot less efficient with the general QueryBuilder approach, because you have to fetch every row in the node table, and iterate on it. Rather than with the backend implementation, just using SQL COUNT queries that can do a lot more work on the server side, and significantly reduce data transfer

After a closer look, it would indeed be slower, but only because we don't have a distinct-count functionality on the QueryBuilder. Currently, the best you can do is:

def get_creation_statistics(user_pk=None): from collections import Counter from aiida import orm builder_total = orm.QueryBuilder().append(orm.Node, tag='node', project='*') builder_types = orm.QueryBuilder().append(orm.Node, tag='node', project='node_type') builder_ctime = orm.QueryBuilder().append(orm.Node, tag='node', project='ctime').order_by({'node': 'ctime'}) if user_pk is not None: builder_total.append(orm.User, filters={'id': user_pk}, with_node='node') builder_types.append(orm.User, filters={'id': user_pk}, with_node='node') builder_ctime.append(orm.User, filters={'id': user_pk}, with_node='node') return { 'total': builder_total.count(), 'types': dict(Counter(builder_types.all(flat=True))), 'ctime_by_day': dict(Counter([ctime.strftime('%Y-%m-%d') for ctime in builder_ctime.all(flat=True)])), }

If we were to add the possibility to do .distinct(count=True) on the QueryBuilder we could have the same performance and not have an ad-hoc solution. This wouldn't have to be done in this PR of course. But after refactoring the QB do you think adding this feature would be complex? If not I would do this in a separate PR and then remove the custom get_creation_statistics and replace it with my suggestion above and move it to the REST API.

Ha, you read my mind, yes I do think there is a possibility to do some further generalization (in a separate PR).
I'm not quite sure what you mean by .distinct(count=True); are you expecting distinct to actually run a query here, and return a result?
(because that is not how distinct works, it just adds an additional parameter for the query)

What I mean is that in sqlalchemy you can do something like ~~session.query(DbNode.node_type).distinct().count() which is more or less what the current function is doing~~ nvm, that actually doesn't work as I though, still it should be possible somehow. We would just need to forward this concept to the frontend QueryBuilder class. The direct translation is already possible, i.e. QueryBuilder().append(Node, project='node_type').distinct().count() but this will return the number of distinct node_types and not a count for each distinct node_type. So my first though was to add a boolean arg to distinct which then gets passed to the implementation and it does the right thing. Would be fine with other interfaces, just would be an interesting feature to have.

Yeh I think this would be confusing for what .count() does and it would be better to have a completely separate method, but yeh lets open an issue on that and I can have more of a think

Just to note, .distinct() does not touch the backend at all now, i.e. it just stores {"distinct": True} in the dictionary passed to the backend when you call one of the "query" methods (count, all, dict, etc). In fact you can also do e.g. .distinct(True).distinct(False)

aiida/backends/general/migrations/duplicate_uuids.py

chrisjsewell force-pushed the remove-query-manager branch from fd16b97 to 845b112 Compare August 26, 2021 17:08

sphuber requested changes Aug 31, 2021

View reviewed changes

chrisjsewell commented Sep 1, 2021

View reviewed changes

aiida/backends/general/migrations/duplicate_uuids.py Outdated Show resolved Hide resolved

chrisjsewell added 3 commits September 1, 2021 14:51

♻️ REFACTOR: Remove QueryManager

8e3befa

Remove aiida/manage/database/integrity/duplicate_uuid.py

cccbcc9

set explicit dry_run value

e58d7e2

chrisjsewell force-pushed the remove-query-manager branch from 962175d to e58d7e2 Compare September 1, 2021 12:53

sphuber approved these changes Sep 2, 2021

View reviewed changes

sphuber merged commit 1dac2fe into aiidateam:develop Sep 2, 2021

chrisjsewell deleted the remove-query-manager branch September 16, 2021 01:26

chrisjsewell mentioned this pull request Mar 12, 2022

🔀 MERGE: Release v2.0.0b1 #5426

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

♻️ REFACTOR: Remove `QueryManager` #5101

♻️ REFACTOR: Remove `QueryManager` #5101

chrisjsewell commented Aug 26, 2021 •

edited

Loading

codecov bot commented Aug 26, 2021 •

edited

Loading

sphuber commented Aug 31, 2021

sphuber left a comment

sphuber Aug 31, 2021

chrisjsewell Sep 1, 2021 •

edited

Loading

sphuber Aug 31, 2021

chrisjsewell Sep 1, 2021

sphuber Sep 1, 2021

chrisjsewell Sep 1, 2021 •

edited

Loading

sphuber Sep 1, 2021

chrisjsewell Sep 1, 2021

sphuber Sep 1, 2021 •

edited

Loading

chrisjsewell Sep 1, 2021



		def deduplicate_uuids(table=None, dry_run=True):
		def deduplicate_uuids(table=None, dry_run=False):

♻️ REFACTOR: Remove QueryManager #5101

♻️ REFACTOR: Remove QueryManager #5101

Conversation

chrisjsewell commented Aug 26, 2021 • edited Loading

codecov bot commented Aug 26, 2021 • edited Loading

Codecov Report

sphuber commented Aug 31, 2021

sphuber left a comment

Choose a reason for hiding this comment

sphuber Aug 31, 2021

Choose a reason for hiding this comment

chrisjsewell Sep 1, 2021 • edited Loading

Choose a reason for hiding this comment

sphuber Aug 31, 2021

Choose a reason for hiding this comment

chrisjsewell Sep 1, 2021

Choose a reason for hiding this comment

sphuber Sep 1, 2021

Choose a reason for hiding this comment

chrisjsewell Sep 1, 2021 • edited Loading

Choose a reason for hiding this comment

sphuber Sep 1, 2021

Choose a reason for hiding this comment

chrisjsewell Sep 1, 2021

Choose a reason for hiding this comment

sphuber Sep 1, 2021 • edited Loading

Choose a reason for hiding this comment

chrisjsewell Sep 1, 2021

Choose a reason for hiding this comment

♻️ REFACTOR: Remove `QueryManager` #5101

♻️ REFACTOR: Remove `QueryManager` #5101

chrisjsewell commented Aug 26, 2021 •

edited

Loading

codecov bot commented Aug 26, 2021 •

edited

Loading

chrisjsewell Sep 1, 2021 •

edited

Loading

chrisjsewell Sep 1, 2021 •

edited

Loading

sphuber Sep 1, 2021 •

edited

Loading