Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalise process state information for calculation nodes #1125

Merged
merged 7 commits into from
Feb 21, 2018

Conversation

sphuber
Copy link
Contributor

@sphuber sphuber commented Feb 16, 2018

Fixes #1062

The various Calculation types were using different styles of state information
as well as the methods to inquire about it. For example JobCalculations have
their own state in the CalcState, which was used in the old system to move the
calculation through its states during execution, With the advent of the Process
system, WorkCalculations were introduced, which did not have their own calculation
state. Rather they use the state that the Process provides for them. However,
users cannot always access the Process and so the state has to be exposed through
the calculation node. This is now done by setting the PROCESS_STATE_KEY in the
attributes of the calculation node when the process goes through a state change.

With JobProcesses and the recent port of InlineCalculations to workfunctions, all
calculations will now have this attribute set. This means we can defined the state
information properties in the base AbstractCalculation class. Plum defines three
active states:

* CREATED
* RUNNING
* WAITING

and three terminal states:

* FINISHED
* KILLED
* EXCEPTED

where FINISHED means the process finished nominally, EXCEPTED means the process
incurred an unhandled exception and was handled by plum and KILLED means the process
was requested to be killed from the outside, for example by the user.
We map these process states on 6 meta state properties

* is_terminated: The process is finished, killed or excepted
* is_finished: The process is FINISHED, with successful or not result
* is_killed: The process is KILLED
* is_excepted: The process is EXCEPTED
* is_finished_ok: The process is FINISHED and the execution was successful
* is_failed: The process is FINISHED but the execution was not successful

To distinguish between finished_ok and failed, the legacy JobCalculation already
has a mechanism, namely the calculation state. All of the *FAILED states will
indicate a failed calculation, whereas only CalcState.FINISHED means it was
successful.

The Process, and therewith WorkChain, does not have a mechanism to set an exit
status yet, but that will be handled in another issue. When that is provided
it should be used in the finished_ok and failed methods to distinguish between
those two variants of ProcessState.FINISHED

@sphuber sphuber force-pushed the fix_1062_normalise_calc_state branch 11 times, most recently from f415854 to 05f2c83 Compare February 21, 2018 09:59
The various Calculation types were using different styles of state information
as well as the methods to inquire about it. For example JobCalculations have
their own state in the CalcState, which was used in the old system to move the
calculation through its states during execution, With the advent of the Process
system, WorkCalculations were introduced, which did not have their own calculation
state. Rather they use the state that the Process provides for them. However,
users cannot always access the Process and so the state has to be exposed through
the calculation node. This is now done by setting the PROCESS_STATE_KEY in the
attributes of the calculation node when the process goes through a state change.

With JobProcesses and the recent port of InlineCalculations to workfunctions, all
calculations will now have this attribute set. This means we can defined the state
information properties in the base AbstractCalculation class. Plum defines three
active states:

	* CREATED
	* RUNNING
	* WAITING

and three terminal states:

	* FINISHED
	* KILLED
	* EXCEPTED

where FINISHED means the process finished nominally, EXCEPTED means the process
incurred an unhandled exception and was handled by plum and KILLED means the process
was requested to be killed from the outside, for example by the user.
We map these process states on 6 meta state properties

	* terminated: The process is finished, killed or excepted
	* finished: The process is FINISHED, with successful or not result
	* killed: The process is KILLED
	* excepted: The process is EXCEPTED
	* finished_ok: The process is FINISHED and the execution was successful
	* failed: The process is FINISHED but the execution was not successful

To distinguish between finished_ok and failed, the legacy JobCalculation already
has a mechanism, namely the calculation state. All of the *FAILED states will
indicate a failed calculation, whereas only CalcState.FINISHED means it was
successful.

The Process, and therewith WorkChain, does not have a mechanism to set an exit
status yet, but that will be handled in another issue. When that is provided
it should be used in the finished_ok and failed methods to distinguish between
those two variants of ProcessState.FINISHED
The previous requirement of unicode was to strict. Changing to
basestring will cover all acceptable cases
This will test that both style of launching JobCalculations (either the old
way of directly creating a JobCalculation instance, storing it and calling
submit on it, as well as the new way of directly creating a JobProcess) is
supported by the new daemon.
Having 'is_finished' instead of 'finished' is more clear that it will
return a boolean. Therefore all state quering properties of the Calculation
class now have that prefix

	* is_terminated
	* is_finished
	* is_excepted
	* is_killed
	* is_finished_ok
	* is_failed

Also fixed tests involving legacy workflows. For testing purposed these
dummy workflows attached job calculations to already set the state to
FINISHED and were being taken submitted or taken through the steps. As
a result, the machinery checking whether they had finished, which now
relies on the process state being set, was never finishing and the tests
would run indefinitely
@sphuber sphuber force-pushed the fix_1062_normalise_calc_state branch 14 times, most recently from 8b999dd to 3a3744f Compare February 21, 2018 17:15
For some as of yet unknown reason, having the interval for ssh at 30
will fail the daemon tests on Travis, but only for SqlA. The last
job calculation that gets submitted is properly finished, just like
all the other ones, except that its process state attribute is not
updated to be Finished, but rather is stuck in Waiting. This means
that the conditional for all calculations being Terminated is not
satisified (even though all job calculation are FINISHED) and the test
simply times out.
@sphuber sphuber force-pushed the fix_1062_normalise_calc_state branch from 3a3744f to 0e3c7b5 Compare February 21, 2018 17:39
return ProcessState(state)

@process_state.setter
def process_state(self, state):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we privatise?

@sphuber sphuber merged commit a61069d into aiidateam:workflows Feb 21, 2018
@sphuber sphuber deleted the fix_1062_normalise_calc_state branch February 21, 2018 18:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants