Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for process classes to QueryBuilder.append #2421

Conversation

ltalirz
Copy link
Member

@ltalirz ltalirz commented Jan 25, 2019

fix #2400
fix #2418

  • move classifiers from individual return arguments into a classifiers dictionary
  • remove query_type_string from classifiers (can be deduced from ormclass_type_string)
  • add process_type_string
  • add basic tests for correct classification when passing a process class
  • allow passing lists (and sets) of classes to the Querybuilder (instead of just tuples)

To do

  • write test: actually run a process (or create corresponding instances by hand) and query for it
  • fix problem in building the docs
  • document that one can pass process classes to the querybuilder (and what happens when you do this)

@ltalirz ltalirz changed the base branch from develop to provenance_redesign January 25, 2019 09:26
@coveralls
Copy link

coveralls commented Jan 25, 2019

Coverage Status

Coverage increased (+0.03%) to 70.084% when pulling fc1c491 on ltalirz:issue_2400_process_in_querybuilder into 56344da on aiidateam:provenance_redesign.

@ltalirz
Copy link
Member Author

ltalirz commented Jan 31, 2019

@sphuber This PR is now basically ready for review.

There are some strange test failures (only 2 tests, only on python2.7)
https://travis-ci.org/aiidateam/aiida_core/jobs/486559586#L3074

It's not clear to me how they relate to the changes I'm making here --
Just wondering whether you've come across something like this before I start investigating.

@sphuber sphuber force-pushed the issue_2400_process_in_querybuilder branch from 4dde4e5 to 9e95feb Compare February 8, 2019 19:29
@sphuber
Copy link
Contributor

sphuber commented Feb 8, 2019

@ltalirz I took the liberty to squash and rebase your commits onto provenance_redesign since I merged a whole slew of big PRs today. Just in case I made a backup of it first on your remote. I also did this because I wanted to test it and on a first look it seems to work nicely.

I have two questions still:

  1. I was a bit suprised to find that I could now add a tuple of two wildly different node types, for example QueryBuilder().append((ArithmeticAddCalculation, ParameterData)).all() will not complain and will retrieve all the nodes that match those classes. Do you know whether this was always the case?
  2. When originally discussing this with @giovannipizzi , we though it would also be nice to allow sub classing on the process class, i.e. on the process_type. Imagine the following calculation entry points:
 'quantumespresso.pw': 'aiida_quantumespresso.workflows.pw:PwWorkChain',
 'quantumespresso.pw.base': 'aiida_quantumespresso.workflows.pw.base:PwBaseWorkChain',
 'quantumespresso.pw.relax': 'aiida_quantumespresso.workflows.pw.relax:PwRelaxWorkChain',

The entry point names here could function just as the node type strings, as in when appending the PwWorkChain class, it would not only match process_type=quantumespresso.pw but also process_type LIKE 'quantumespresso.pw.%', which would therefore also find the instances of PwBaseWorkChain and PwRelaxWorkChain. If this is complicated, it would be fine to add this later in v1.0.0, but I mention it here if it might require a fundamental change in the current solution that we might want to consider before merging

@ltalirz
Copy link
Member Author

ltalirz commented Feb 8, 2019

  1. Do you know whether this was always the case?

I think it works as long as what you are querying for resolves to "Nodes", which is the case in your example. I didn't modify the logic for handling lists - if you would like to modify it, let me know.
If it was up to me, I would actually vote to remove this functionality completely, since I got the impression that 40% of the querybuilder code just deals with distinguishing between someone passing a single class or a list/set/tuple. But maybe there are important use cases that I am not aware of.

  1. It would also be nice to allow sub classing on the process class

That's a valid point. I skipped this for the moment, since I didn't know whether there was any kind of standardization of the process_type string (and whether these strings were still going to change).
Note that at the moment it can be e.g. aiida.work.workchain.WorkChain (no colon, see here) but also aiida.calculations:arithmetic.add (with colon).
If you give me a specification of the process_type string and a foolproof rule for the subclassing query, it should be very easy for me to add.

The only thing is that there is currently only one subclassing parameter but I guess we could live with this applying to both the ormclass type and the process type simultaneously.

Any `Process` class will have an associated `ProcessNode` sub class
associated with it that it will use to store its provenance when ran. In
addition with the `process_type`, determined by its specific process sub
class entry point, defines a set of nodes that correspond to that
process class.

For example: the `ArithmeticAddCalculation` will use a `CalcJobNode` to
store its executions and has a process type according to its entry point:

    `aiida.calculations:arithmetic.add`

This commit implements the functionality necessary that allows a user to
append the `ArithmeticAddCalculation` process class to a query builder
instance directly.
@sphuber sphuber force-pushed the issue_2400_process_in_querybuilder branch from 9e95feb to b948b2a Compare February 9, 2019 08:18
@ltalirz
Copy link
Member Author

ltalirz commented Feb 13, 2019

So, to sum up:

  1. the process type string is either

    • the import path of the class that defines the process, or
    • the entry point of the class defining the process (if there is one). Uses get_entry_point_from_class
  2. the orm class type string is either

    • the import path of the defining class, appended by '.' or
    • the entry point of the class defining the class (if there is one), appended by '.'.
      Uses get_entry_point_string_from_class, which adds a colon (different from get_entry_point_from_class)
  3. subclassing will enable subclassing both on the ormclass type and the process type

    • for the process type, we will only enable subclassing, when there is an entry point (this was suggested by @giovannipizzi . Not sure whether this is a good idea, since e.g. we might want subclassing also for AiiDA-internal process types, let me know)

@sphuber Is this correct, and is that how we should proceed?

Looking at this mess, I think users will rightfully ask why it has to be so complicated and unintuitive. I'm ok to go with this for the moment, though, to get the 1.0 beta out.

@sphuber
Copy link
Contributor

sphuber commented Feb 13, 2019

For 2. the type string is never an entry point string but always one that is based on the module path of the node class. For ProcessNodes and sub classes this is easy because they are always internal to aiida-core and cannot be sub classed. Data sub classes can be defined externally and their type string is based on their entry point name if they have an associated entry point, with a fallback on the full module path plus class name.

For the process_type, we only will support querying if the Process class passed to append has a matching entry point, and if not an exception should be raised explaining that querying for unregistered process classes is not supported. This is with the exception of the internal classes Process, CalcJob and WorkChain where there will only be a filter on the type string and no filter at all on the process_type string.

  1. Yes, subclassing means all or nothing

@ltalirz
Copy link
Member Author

ltalirz commented Feb 13, 2019

This is with the exception of the internal classes Process, CalcJob and WorkChain where there will only be a filter on the type string and no filter at all on the process_type string.

Hm... in the end I don't think it is a good rule. What if someone specifies only a process_type_string (and no orm class type)? Instead of using no filter, for Process, CalcJob and WorkChain I will simply enable the same subclassing as one does for the orm class type string.

@sphuber
Copy link
Contributor

sphuber commented Feb 13, 2019

You mean you will enable the same subclassing on the process_type as you would on the type string when the user passes Process, CalcJob or WorkChain? That does not make sense, because their process type is empty and if it would be done on a module path based string, subclasses would have to be put below the base class in the module hierarchy for it to work, which means users have to place their subclasses in the aiida-core tree, which is not something we want. Or am I missing something here?

@ltalirz
Copy link
Member Author

ltalirz commented Feb 13, 2019

when the user passes Process, CalcJob or WorkChain? That does not make sense, because their process type is empty

The process_type of Process is:

In [4]: Process.get_process_type()
Out[4]: 'aiida.work.processes.Process'

This is what would be passed to the QueryBuilder

if it would be done on a module path based string, subclasses would have to be put below the base class in the module hierarchy for it to work, which means users have to place their subclasses in the aiida-core tree, which is not something we want.

I was thinking more of the usecase where you want to catch aiida-internal subclasses.
But you are right - one might want to append a WorkChain and get all possible workchains from all plugins.
This is more important than the more esoteric case of a process_type without an ormclass type (even though this is an indirect and fragile way of making sure the query works - e.g. it will break as soon as there is no 1-1 correspondence between process classes and node classes).
I will point this out in the code.

 + start adding tests

dirty commit - first need to get a working test database
 * set_process_type => build_process_type
 * move call to build_process_type outside _set_process_type
 * check that warnings are raised when querying for process classes
   that don't have an entry point (and are not built-in)
…rz/aiida_core into issue_2400_process_in_querybuilder
@ltalirz ltalirz force-pushed the issue_2400_process_in_querybuilder branch from 3070d37 to 15e500b Compare February 14, 2019 18:03
remove unnecessary code duplication by moving the teardownclass_method
from the backend into the base class
@ltalirz
Copy link
Member Author

ltalirz commented Feb 15, 2019

Hm... after some digging, it turns out that reset_manager() is actually called not only in tearDownClass but also in tearDown, i.e. between every test.
I suspect that somewhere (must be related to work chains) there is some cache that is not cleared, which gives rise to these problems.

For the record: Deleting the travis cache did change something (perhaps it changes some order):

  • before: envs failing were django 2.7, sqla 2.7/3.6; tests: test_persisting, test_report_dbloghandler
  • after: envs failing are django 2.7/3.6, sqla 2.7; tests: test_persisting, test_str (2x), test_report_dbloghandler (1x)

@sphuber
Copy link
Contributor

sphuber commented Feb 15, 2019

Edit: scratch the comment below, I just checked and you have that fix commit. The problem is rather related to the user not being present anymore, like in Django.

Are you sure you are on the latest provenance_design commit. I fixed a bug related to the DbLogHandler recently that could render the database in an inconsistent state, at least for SqlA. This is definitely the error I see in test_report_dbloghandler on SqlA 2.7 that fails.

@ltalirz
Copy link
Member Author

ltalirz commented Feb 15, 2019

Just an update - I'm finally able to reproduce the tests locally (not when running verdi devel tests db.work but when running all tests with verdi devel tests).
Will try to pin down what causes the problem.

@ltalirz
Copy link
Member Author

ltalirz commented Feb 21, 2019

@sphuber We discussed at some point you wanted to use ProcessNode._set_process_type to se the process type. It turns out, however, that there is a Node.process_type with a getter and a setter.

I therefore propose to get rid of ProcessNode._set_process_type, do you agree?
Either this or the process_type property should be moved out of Node and into ProcessNode.

@sphuber
Copy link
Contributor

sphuber commented Feb 21, 2019

If it is a proper setter, as in it gets a value and sets it, then it is fine to get rid of _set_process_type. If I remember correctly, you moved the logic that was in there in a build_process_type. The engine can then call node.process_type = ProcessClass.build_process_type() or something like that.

use setter from top-level Node class
@ltalirz
Copy link
Member Author

ltalirz commented Feb 21, 2019

@sphuber @giovannipizzi I believe I finally found the cause of the failing tests.

The problem is that WorkChain classes can store references to ORM objects, such as here.
After a database reset these classes will have references to orphaned nodes, so in general one cannot touch any WorkChain that was imported before the reset.

While it is easy to avoid this in this PR, I think it is a serious issue.
For example, it means WorkChains will prevent us from switching backends in AiiDA scripts because the Data nodes they already reference won't adapt to a change of the backend.
Has this been considered so far?

My original thought was wrong. I guess, in principle, switching backends could still work, right?

It's just the database clearing that is an issue...
It means there is currently no way for tests to guarantee to "go back to square one".

 + use locally defined WorkChain to avoid problems with
   references to ORM objects stored inside WorkChain class
otherwise, pylint may update on travis but not locally
(not even after pip install -e .[dev_precommit])
@ltalirz
Copy link
Member Author

ltalirz commented Feb 21, 2019

Tests are passing now; also added some (relatively minimal) documentation.

@giovannipizzi
Copy link
Member

actually, one is still failing...

@ltalirz
Copy link
Member Author

ltalirz commented Feb 22, 2019

One of these "no output for 10 minutes" cases.
Actually, the same test (sqla with python 3.6) was also failing for the same reason on the previous commit. After resetting the cache and restarting the job, it passes. Not quite sure what to make of this.

Also, should I open an issue for the problem mentioned above?

@sphuber
Copy link
Contributor

sphuber commented Feb 22, 2019

I think the problem of node references is something that has caused problems before, for example it was also related to the creation of duplicate node UUIDs. I think we should maybe disallow adding Node instances as defaults in the classmethods like the define and we should replace this with a lambda that will not instantiate the node until it is actually called, not directly on import. Anyway, this is a bigger problem and deserves its own issue and PR, I would not bother with it here.

@ltalirz
Copy link
Member Author

ltalirz commented Feb 22, 2019

Ok, I'll open an issue for it.

@ltalirz
Copy link
Member Author

ltalirz commented Feb 28, 2019

closing until final merge to avoid blocking builds

@ltalirz ltalirz closed this Feb 28, 2019
not sure why this was not fixed automatically...
@ltalirz ltalirz reopened this Feb 28, 2019
Copy link
Contributor

@sphuber sphuber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few minor comments and a question

aiida/engine/processes/process.py Outdated Show resolved Hide resolved
aiida/cmdline/commands/cmd_daemon.py Outdated Show resolved Hide resolved
aiida/orm/nodes/process/process.py Outdated Show resolved Hide resolved
aiida/plugins/entry_point.py Outdated Show resolved Hide resolved
else:
if ":" in value:
# if value is an entry point, do usual subclassing
filter = {'or': [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need an or statement again in the case of sub classing on the process_type but not in the case of the type? I forgot

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The process_type is stored in the database without a . at the end.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, if you push the small changes for the other comments I will approve and merge

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've addressed your comments now.
Your comment made me realize an issue - it seems to me one should use get_query_type_from_type_string also here but when adding it, some tests fail.
I currently don't understand why (perhaps I'm also mistaken). If it is an issue, I'll prefer to fix this in a separate PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, let's get this merged and address any issues that may arise later

@sphuber sphuber merged commit da6bb4d into aiidateam:provenance_redesign Mar 1, 2019
@sphuber
Copy link
Contributor

sphuber commented Mar 1, 2019

Thanks a lot @ltalirz , great work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants